Closed aqkfatmtvvfb closed 6 years ago
Hello Yu,
string should be 'UTF-8' for MDF version 4, and 'latin-1' for MDF 3.
The application that generates the file has to use those encoding:
bytestring = '贴马路牙子左右转向'.encode('utf-8')
string = bytestring.decode('utf-8')
@danielhrisca thanks for this information! i think it should be necessary to detect the encoding method after reading raw comment in MDF3 and before saving it to the data structure. there is an existing library named 'chardet' can solve this problem.
chardet.detect(comment.encode('latin-1')) {'confidence': 0.99, 'language': 'Chinese', 'encoding': 'GB2312'}
Basically you are getting a file that does not comply with the MDF standard. The tool developer should make sure to follow the standard.
I would agree on Daniel's point, specification forces utf-8 usage. However Yu, you are also right, user could add new channel or modify comments and mdfreader does not check compliance to spec including character encoding when writing file So chardet introduction could be interesting consideration if you have time to contribute :)
Environment:
Comment Show In OS:
'贴马路牙子左右转向'
Comment Read by mdfreader:
'ÌùÂí·ÑÀ×Ó×óÓÒתÏò'
error analysis and solution.
raw comment is encoded by ‘GBK’