psu-libraries / scholarsphere-3

A web application for ingest, curation, search, and display of digital assets. Powered by Hydra technologies (Rails, Hydra-head, Blacklight, Solr, Fedora Commons, etc.)
Apache License 2.0
78 stars 24 forks source link

Mangled Text from non-UTF Characters in READMEs #1582

Open srerickson opened 5 years ago

srerickson commented 5 years ago

One of the SS features curators make frequent use of is the display of READMEs on work pages using markdown formatting. It appears that READMEs are assumed to be encoded with utf8, so when non-utf8 READMEs are uploaded, the displayed text is mangled. For example, this README appears to be encoded with Windows 1252, and as a result, "città di Roma" appears as "citt� di Roma".

To resolve this, curators need to download the README, convert it to utf8, and re-upload it. It would be useful to do this kind of conversion in place.

This is related to #1236 , however the desired functionality would alter the saved file.

Low priority.