Open eifelkiwi opened 3 weeks ago
What encoding does your CSV file have? I would assume that Python opens it as UTF-8.
The other possible thing is that the HTML doesn't declare the encoding correctly, but I think I got that right:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
I had it in UTF-8. Since I had to manually change some entries, I did this in Notepad++, but it says UTF-8 encoding.
The way that it looks in your screenshot is that the actual byte representation was UTF-8 at some point because the "ü" is encoded in two bytes. This happens when at some point one has UTF-8 data that is interpreted in Latin-1.
I tried to reproduce it with this activities.csv
:
And that imports correctly:
And it displays correctly in the browser:
Could you install the parquet tools and take a look at the file?
parquet-tools show Cache/Activity/activities.parquet
Then it should become clear whether the import is broken or whether the display in the browser is broken.
This is easier to read:
❯ parquet-tools show -c name Cache/Activity/activities.parquet
+--------------------+
| name |
|--------------------|
| Läüfen Ländäl Pärk |
+--------------------+
UTF characters (ä ö ü, but also emojis) in activity names are displayed in ISO 8859-1 (Latin-1) encoding on the webpage.
Additionally, some emojis (I could not reproduce which, but manually deleting all solved the problem) cause the script to throw an error (charmap undefinded)
Edit: those compound emojis cause a big problem, namely an undefined char 0x8f (which is not assigned). After this error occurs, I manually changed the csv, but had to clear the whole cache for the script to run through.
Eidt2: this seems to be a csv-parser thing only. Activities directly imported from Strava having emojis and umlauts are displayed correctly.