tfussell / xlnt

:bar_chart: Cross-platform user-friendly xlsx library for C++11+
Other
1.49k stars 418 forks source link

Cell encoding #588

Closed fmdn closed 3 years ago

fmdn commented 3 years ago

Cannnot get cells that have letters with accent. I think it is about encoding

DJ534 commented 3 years ago

Hi, I am using xlnt on ubuntu 20.04 with letters with accents without any problems - I am reading and writing names in czech languague. I am reading documents created in windows in MS Office. They include these characters: ěščřžýáíéůú (I do not know, if they display correctly to you).

fmdn commented 3 years ago

Thank you for your reply, I have something like this "Enseignement général" while i read this "Enseignement général"

DJ534 commented 3 years ago

I am not a specialist on encoding at all. But I will try to make my best guess here. If you just read data and send them on the standard output and get the character mismatch, your OS probably uses different encoding then the data in the xlsx file. The compiler uses its own encoding as well. Unfortunatelly, I do not know, how this comes exactly into play.

Here, I found something, that can be usefull to you: https://stackoverflow.com/questions/45194771/are-xlsx-files-utf-8-encoded-by-definition

So, try to check the xlsx document encoding and the system encoding first. If there is a mismatch, there are some libraries performing conversions. Like qt for example

https://doc.qt.io/archives/qtjambi-4.5.2_01/com/trolltech/qt/core/QTextCodec.html

EDIT: I have been playing with the example, that you provided, for a while and figured out, that most probably the mismatch occurs between Windows-1252 encoding and UTF-8 encoding. So, I suspect your xlsx file to be in UTF-8 encoding and system with Windows-1252 encoding. Are you running windows?

fmdn commented 3 years ago

Thank you, You are right, my file was wroten with UTF-8 but the string i use use is Latin1.