openambitproject / openambit

openambit
277 stars 82 forks source link

Languages with unicode characters #266

Open cbertram opened 3 years ago

cbertram commented 3 years ago

I have my watch set to Danish. In Danish we have the characters æ, ø, å. For example "run" is "løb" in Danish. When importing a run to openambit, the title is not "Løb" as expected, but "LÞb". It would be nice if full utf-8 or whatever the watch uses was supported.

centic9 commented 3 years ago

Can you attach some related log-files from the ~/.openambit directory, both .log and .xml if they are available so we can take a look how the data is stored there.

cbertram commented 3 years ago

Here is a short .log with <Activity>LÞb</Activity> which should be <Activity>Løb</Activity>. (It's not a real run, just a test) I can't see any .xml in the ~/.openambit/. log_060E115111000A00_2021_01_26_14_10_16.log

centic9 commented 3 years ago

I reviewed the code-locations and results if I use an activity name with non-ASCII characters (German Umlauts in my case). With an Ambit 2 this worked fine:

        <Activity>Übergang</Activity>

The code seems to be the same for Ambit 3 as well.

The current code expects that the watch sends data in ISO-8859-15, which does also contain the characters that you use, so I would expect it to work as well, I also tested with an Activity which contains such a character and it still worked for me.

Maybe the watch uses a different character encoding depending on your location or something like that... :(

Which regional settings do you use in your movescount profile? E.g. from general profile: location, timezone, home-location, and from watch-settings: language, ...

paddy-hack commented 3 years ago

Please see #70 for additional information on character encodings. For later models things may have changed of course.
Sticking with ISO-8859-15 makes it impossible to cater to the Asian market for one thing.

centic9 commented 3 years ago

Thanks for the link to previous work, that is helpful.

Maybe Amibt 3 natively encodes in UTF-8 and this causes the double-encoding here.

LÞb is hex 4C C3 83 C5 BE 62 Løb is hex 4C C3 B8 62

Both look like UTF-8 encoded values (C3), you get the wrong one if you perform the conversion "ISO-8859-15 -> UTF-8" twice.

@cbertram Maybe you can test with an activity-name which has lots of "special characters" in it's name and attach the resulting logs so that we can verify if that is the case.

BTW, for me "Dansk" is not even available as language in the Watch settings for Ambit 2.

cbertram commented 3 years ago

I made two very short exercises.

log_060E115111000A00_2021_02_01_10_16_14.log The first should be is called Svøm. i åb. vand in the menu, but results in <Activity>SvÞm. i Ã¥b. va</Activity>.

log_060E115111000A00_2021_02_01_10_16_56.log The second is called Indendørstræning in the menu, bet results in <Activity>IndendÞrstrÊni</Activity>.

centic9 commented 3 years ago

I prepared a test-branch at https://github.com/openambitproject/openambit/tree/try_utf_8_encoding_for_ambit_3 which simply tries to switch reading of data from Ambit 3 always directly as UTF-8 encoded.

Can you try this version to see if that makes it work for you?