martin-ueding / geo-activity-playground

Data analysis and visualization based on GPS tracked outdoor activities.
https://martin-ueding.github.io/geo-activity-playground/
MIT License
30 stars 12 forks source link

Characters in activity names are not properly displayed #154

Open eifelkiwi opened 3 weeks ago

eifelkiwi commented 3 weeks ago

UTF characters (ä ö ü, but also emojis) in activity names are displayed in ISO 8859-1 (Latin-1) encoding on the webpage.

image

Additionally, some emojis (I could not reproduce which, but manually deleting all solved the problem) cause the script to throw an error (charmap undefinded)

Edit: those compound emojis cause a big problem, namely an undefined char 0x8f (which is not assigned). After this error occurs, I manually changed the csv, but had to clear the whole cache for the script to run through.

Eidt2: this seems to be a csv-parser thing only. Activities directly imported from Strava having emojis and umlauts are displayed correctly.

martin-ueding commented 2 weeks ago

What encoding does your CSV file have? I would assume that Python opens it as UTF-8.

The other possible thing is that the HTML doesn't declare the encoding correctly, but I think I got that right:

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="utf-8">
eifelkiwi commented 2 weeks ago

I had it in UTF-8. Since I had to manually change some entries, I did this in Notepad++, but it says UTF-8 encoding.

martin-ueding commented 1 week ago

The way that it looks in your screenshot is that the actual byte representation was UTF-8 at some point because the "ü" is encoded in two bytes. This happens when at some point one has UTF-8 data that is interpreted in Latin-1.

I tried to reproduce it with this activities.csv:

Bildschirmfoto_20240907_141130

And that imports correctly:

Bildschirmfoto_20240907_141314

And it displays correctly in the browser:

Bildschirmfoto_20240907_141107

Could you install the parquet tools and take a look at the file?

parquet-tools show Cache/Activity/activities.parquet

Then it should become clear whether the import is broken or whether the display in the browser is broken.

martin-ueding commented 1 week ago

This is easier to read:

❯ parquet-tools show -c name Cache/Activity/activities.parquet 
+--------------------+
| name               |
|--------------------|
| Läüfen Ländäl Pärk |
+--------------------+