zackbatist / open-archaeo

A list of open source archaeological software and resources
https://open-archaeo.info
Creative Commons Zero v1.0 Universal
85 stars 16 forks source link

Accented characters are broken #31

Closed zackbatist closed 1 year ago

zackbatist commented 1 year ago

This has been a recurring issue for some time but it warrants more intensive investigation. Characters with accents get replaced with strange combinations of alternative characters or fail to render at all. This may have to do with the different applications that the primary data source interfaces with, including text editors and spreadsheet editors used by various contributors, by R as it passes through csv2md.r, by hugo, and by git. Different operating systems may also effect character encoding in different ways, particularly in finder/file explorer.

joeroe commented 1 year ago

I'm quite confident that the encoding will be preserved by R and hugo. The problem appears to be when open-archaeo.csv is edited with the wrong encoding. For example, in e7f139041e7af7c3f3710882fa2c4b5ae8289752 it looks like whatever software was used to edit interpreted the file as Windows-1252 instead of UTF-8 (see https://www.i18nqa.com/debug/utf8-debug.html). R, Hugo, GitHub and browsers (because we have <meta charset="utf-8"> in the rendered HTML) all assume the file is in UTF-8.

We could perhaps set up a GitHub Action that blocks merging of PRs with any of these encoding problems?

zackbatist commented 1 year ago

I tried using a different text editor and bypassing Excel completely and it works fine now. The text editor I had been using has not been updated in years, so I will change over to a new one. I think this is a unique case (I use an older computer with limited range of supported software) so I don't think there is any need to set up a GitHub Action.