ufal / clarin-dspace

clarin-dspace digital repository based on DSpace and LINDAT/CLARIN DSpace
http://lindat.cz
BSD 3-Clause "New" or "Revised" License
27 stars 17 forks source link

encoding issue with the preview #830

Open Ansa211 opened 6 years ago

Ansa211 commented 6 years ago

Go to https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2538 and preview the html file (the second file in the submission) - it seems that the encoding is broken. Even when I click on the "Download file" button and the file opens in a new browser tab, the encoding still seems broken. However, when I right-click the "Download file" button, save the file to my disk and open it with the same browser, everything is all right. The header in the html specifies the encoding as UTF-8.

Tested with

riccardodg commented 2 years ago

There is a similar issue related to UTF-8 encoding. It happens with ILC4CLARIN's but also with UFAL's installations of DSPACE. This item (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/OPEN-956) has an XML file as bitstream. With Greek and Italian characters. When I click on the download button, the file is served by the browser with the wrong encoding. You may see it at https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/bitstream/handle/20.500.11752/OPEN-956/NAppendix-01.xml?sequence=3&isAllowed=y The same happens with https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1942 See it at https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11372/LRT-1942/proclitics.xml?sequence=1&isAllowed=y

The interesting stuff is that 1) If the file is locally saved, then it is correctly shown 2) With Safari 9.3.5 it works even when served.

I also tried the following 1) curl -I "https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/bitstream/handle/20.500.11752/OPEN-956". I got Content-Type: text/html;charset=utf-8 2) curl -I "https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/bitstream/handle/20.500.11752/OPEN-956/NApp;2Dendix-01.xml?sequence=3&isAllowed=y". I got Content-Type: text/xml;charset=ISO-8859-1 3) file NAppendix-01.xml. I got XML 1.0 document text, UTF-8 Unicode text.

Tested with Safari 15.2 (NOT WORKING) Firefox 96.0.2 (for mac) (NOT WORKING) Safari 9.3.5 (WORKING)