openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
868 stars 210 forks source link

Faceted Browser UTF-8 encoding issue #141

Open jakubklimek opened 10 years ago

jakubklimek commented 10 years ago

Property and Object labels have a wrong encoding of international characters. fct 1.13.56 in 7d38d124560a8df1802edd1dccc534b76c21d044

See the second object of the last property (should be Březiněves, as it is when I click on it - the second screenshot) image http://ruian.linked.opendata.cz/resource/obce/554782

image http://ruian.linked.opendata.cz/resource/katastralni-uzemi/614131

However, on the last page of http://ruian.linked.opendata.cz/resource/katastralni-uzemi/614131 (http://ruian.linked.opendata.cz/describe/?url=http%3A%2F%2Fruian.linked.opendata.cz%2Fresource%2Fkatastralni-uzemi%2F614131&p=1&lp=15&op=-1&last=&gp=1) the same characters are OK, which seems a bit strange: image

HughWilliams commented 10 years ago

Is this still an issue as I do not see the utf-8 encoding issues on your screenshots when I access the live pages on my browser ?

/Users/hwilliams/Dropbox/Screenshots/Screenshot 2014-03-09 13.29.35.png

jakubklimek commented 10 years ago

I still have this issue when I access http://ruian.linked.opendata.cz/resource/obce/554782 image in fct 1.13.57

jakubklimek commented 10 years ago

This is still happening in 8173801092e9fd4abc06d32e642e6d7d72b4bf40 in fct 1.13.59 http://ruian.linked.opendata.cz/resource/obce/554782 image

HughWilliams commented 10 years ago

Has anything changed in the Virtuoso Server instance as the page http://ruian.linked.opendata.cz/resource/obce/554782 , does not load with the corrupt characters as in your screenshot above:

screenshot 2014-06-22 21 37 44

jakubklimek commented 10 years ago

No, but the texts are in Czech. So when I view it from here, they load because of my location I guess and they don't load for you. I think that adding &lang=cs to the url should do the trick.

jakubklimek commented 9 years ago

@HughWilliams This is still happening, database here, dont forget the &lang=cs as you will not be autodetected to be in cs environment.

jakubklimek commented 9 years ago

Plus, now it looks like this: (notice the font): image which is probably the same bug as was fixed in a614876e4fc02a1f9ee256f5492dd357f8c648fa however that was in sparql endpoint, in fct it is still there. However, only in some Chrome instances, which is weird.

HughWilliams commented 9 years ago

@jakubklimek: What is the URL to load as assume it http://localhost:8890/resource/obce/554782 having setup the test database on http://localhost:8890 , but I get a 404 error:

Error HTTP/1.1 404 File not found The requested URL was not found URI = '/resource/obce/554782'

jakubklimek commented 9 years ago

@HughWilliams That is rewritten by Apache to form correct and resolvable Linked Data URIs. For your instance it will be http://localhost:8890/describe/?url=http://ruian.linked.opendata.cz/resource/obce/554782&lang=cs

HughWilliams commented 9 years ago

Loading the page http://localhost:8890/describe/?url=http://ruian.linked.opendata.cz/resource/obce/554782&lang=cs , I do not see the UTF-8 encoding issue you report, see screen shot attached:

screen shot 2015-01-24 at 15 16 44

jakubklimek commented 9 years ago

What about the fourth property... there si an errorneous character on the line containing MOMC On Jan 24, 2015 4:19 PM, "HughWilliams" notifications@github.com wrote:

Loading the page http://localhost:8890/describe/?url= http://ruian.linked.opendata.cz/resource/obce/554782&lang=cs , I do not see the UTF-8 encoding issue you report, see screen shot attached:

[image: screen shot 2015-01-24 at 15 16 44] https://cloud.githubusercontent.com/assets/4868081/5887802/4b02dabc-a3dc-11e4-80d3-e714c97f7942.png

— Reply to this email directly or view it on GitHub https://github.com/openlink/virtuoso-opensource/issues/141#issuecomment-71321820 .

HughWilliams commented 9 years ago

@jakubklimek: This looks like a browser rendering problem as if I view the source of the page I see:

title="Typ MOMC, na něž je statutární město rozčleněno">Typ MOMC, na něž...�sto rozčleněno

jakubklimek commented 9 years ago

So the weird character is generated by browser?

Anyway, I have narrowed it down to an issue in combination with apache mod_proxy_html... I am using that to do the URI resolvablility. When I display the page directly via /describe/?url=... It looks like on your screenshot. However, when I use the resolvable uri via Apache, it is transcoded for no apparent reason - the output document is still utf-8 encoded, but all the special czech characters are destroyed. I am playing with Apache mod_proxy_html now but could not find a solution yet

jakubklimek commented 9 years ago

OK, I found the solution and tested it with a copy of the html page generated by fct. The issue is described here: http://serverfault.com/questions/559518/mod-proxy-html-garbles-non-ascii-characters

Could you add the missing <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> tag to fct generated XHTML output? When this is added there, mod_proxy_html detects the encoding correctly and does not destroy the characters.

And the same problem goes for HTML generated by the SPARQL endpoint.

jakubklimek commented 8 years ago

@HughWilliams Any possibility to fix this by adding the line mentioned in my previous comment? It is quite simple fix. I have tried many times to fix this using various workarounds in Apache but failed.