isawnyu / pleiades-gazetteer

This repository provides a home for tickets and other planning documents for the Pleiades gazetteer of ancient places. Code is kept in multiple other repositories.
https://pleiades.stoa.org
11 stars 0 forks source link

CSV dumps mangling UTF-8 encoding in creator/contributor names #306

Open ryanfb opened 7 years ago

ryanfb commented 7 years ago

On https://pleiades.stoa.org/places/589802, Ηλίας Κολοβός gets mangled to Œ. ŒöŒøŒªŒøŒ≤œåœÇ (maybe not easy to show here, as it also results in an invalid UTF-8 byte sequence).

This seems to have been introduced first in the May 10th CSV places dump, as the May 9th places dump is fine.

You can quickly check this with the csvlint command-line helper tool from the csvlint Ruby gem (gem install csvlint).

paregorios commented 7 years ago

Changed user's full name to a romanized version, but this should be addressed at some point.

paregorios commented 5 years ago

All Pleiades CSV should be saved as UTF-8 without BOM. Need to verify.