selik / xport

Python reader and writer for SAS XPORT data transport files.
MIT License
49 stars 24 forks source link

Does it support UTF-8 ? #72

Closed littlorry closed 2 years ago

littlorry commented 2 years ago

Now, I am trying to export some data by xport, it works well with ASCII, but the chinese characters exception was thrown. Can you tell me how to do, thank you very much.

selik commented 2 years ago

Unicode (and UTF-8) was invented in 1991. I think the XPT format was created before then. I can't remember if XPT allows Latin-1 or something else that'd allow character code points up to 255, but you could try something like this:

blob = text.encode()
text = blob.decode('latin-1')

Your text will have all the data as before, but some of the characters will be garbage. You can then write it to XPT and the reader will know to decode with UTF-8 instead of Latin-1.

selik commented 2 years ago

The correct answer is, "Don't use XPT." You'll need to coordinate with whoever you're sending files to. Either you'll need to tell them how to decode the text, or ask if they can use a file format which supports Unicode.