openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
868 stars 210 forks source link

Bad handling of UTF-8 IRIs in faceted browser #772

Open jakubklimek opened 6 years ago

jakubklimek commented 6 years ago

This relates to #141, #142, #345, #346. Faceted browser 1.13.91 (latest in develop/7 since a few years back) is still having issues handling UTF-8 characters in IRIs.

Here: https://data.gov.cz/zdroj/datová-schránka/x7cab34; look at the entity type: image

Also, in the "documents" list listing the named graphs from which the data is displayed, it looks like this: image

When clicking it, it gets mangled even more: https://data.gov.cz/describe/?url=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A1-schr%C3%A1nka%2Fx7cab34&graph=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%83%C2%A1-sada%2Fseznam-ovm instead of https://data.gov.cz/describe/?url=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A1-schr%C3%A1nka%2Fx7cab34&graph=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A1-sada%2Fseznam-ovm etc. There are more issues like this, making the browsing of such data impossible.

@pkleef Any chance of fixing this since some of the issues were reported more than 4 years ago and, as mentioned in https://github.com/openlink/virtuoso-opensource/issues/618#issuecomment-270274120, it is not updated in VOS anymore? The alternative for us is only developing our own simple LOD browser capable of correct Media type and UTF-8 handling.

pkleef commented 6 years ago

@jakubklimek I did not have enough time to merge this into the 7.2.5 release, but i have scheduled to merge this as part of the current development cycle.

pkleef commented 6 years ago

@jakubklimek As promised i merged a number of fct related patches into develop/7 tree so i invite you to have a look when you have a moment.

jakubklimek commented 6 years ago

@pkleef Regarding IRIs after upgrade to fct 1.16.99 it seems that there was no improvement. See https://slovník.gov.cz/legislativní/sbírka/424/1991/pojem/politická-strana

image

The IRI of the class is still mangled, the data space URL does not support Punycode and then there is the new issue #787 .