spiral-project / ihatemoney

A simple shared budget manager web application
https://ihatemoney.org
Other
1.2k stars 270 forks source link

Fixing the issue of Chinese character garbled encoding in CSV export. #1288

Open qqAys opened 9 months ago

qqAys commented 9 months ago

Fixing the issue of garbled characters (Chinese characters) when opening the CSV export in Microsoft Excel. Using version 6.1.3 and opening CSV export files with Chinese content by default in Microsoft Excel in a Chinese environment results in garbled characters due to encoding issues, as shown in the image (the top part is the current version export, and the bottom part is the export after fixing).

Snipaste_2024-02-04_14-31-15
almet commented 9 months ago

Thanks. This seems to be failing the tests though… We might need to change the way the .csv files are loaded.

qqAys commented 9 months ago

Happy Chinese New Year! Thank you for bringing this to my attention. I've made the necessary adjustments to address the test failures. Regarding the loading of the .csv files, I've implemented some changes to ensure proper handling. Please let me know if you encounter any further issues or if there's anything else I can assist with.

zorun commented 8 months ago

utf-8-sig seems to add a BOM sequence at the start of the CSV file.

Is that standard, and can that break other clients? It feels like Excel is the problem here, it should correctly handle UTF-8 encoded files.

qqAys commented 8 months ago

utf-8-sig seems to add a BOM sequence at the start of the CSV file.

Is that standard, and can that break other clients? It feels like Excel is the problem here, it should correctly handle UTF-8 encoded files.

Thanks for your comment! Indeed, I found that changing the encoding of the CSV file from UTF-8 to UTF-8-SIG can solve the problem of garbled Chinese characters. UTF-8-SIG adds a special identifier at the beginning of the file, helping software to correctly parse the file and avoid garbled characters.

Regarding the issue you mentioned about other clients possibly experiencing problems, it depends mainly on how those clients are implemented. Most modern software can handle UTF-8 files with a BOM correctly, but older versions or software with specific settings might struggle with the BOM. Therefore, using UTF-8-SIG requires consideration of compatibility with different clients.

As for whether Excel should be able to handle UTF-8 encoded files correctly, that is indeed an important question. Modern versions of Excel can handle UTF-8 encoded files well, but there may still be some issues. This might require further improvement and optimization from software vendors.

In conclusion, using UTF-8-SIG is an effective solution, but it's important to balance and adjust based on the actual situation to ensure the best compatibility and user experience. Thanks again for your comment and suggestion!