write_sets with id1 in Chinese get string lost

masonacezllk commented 6 months ago

when data block with string in Chinese like id1='左前方向:S', write_sets lost Chinese string. How to modify this code : for k, v in dset.items(): if type(v) == str: dset[k] = v.encode("utf-8").decode('ascii','ignore') to correct Chinese string?

jankoslavic commented 6 months ago

Thank you @masonacezllk . Would you be so kind and please prepare a Pull request with the proposed corrections and also a test case?

masonacezllk commented 6 months ago

test.zip Hi @jankoslavic The zip file include my unv data,named test.unv. I want to change orignal file data block's id1 with new name,which include Chinese '车速', then wirite the new name in new file and reload it. But when I reload the new unv file, Chinese string '车速' is missing. This is my test code.

import pyuff

# original data file
fname=r'data\test.unv'
uffread = pyuff.UFF(fname)
data=uffread.read_sets()

# replace id1 with new name
data[3]['id1']='Time for 车速'

# save new name in new unv file
newfname=r'data\testnew.unv'
uffwrite = pyuff.UFF(newfname)
uffwrite.write_sets(data,'overwrite')

# load the new unv file
uffread_new = pyuff.UFF(newfname)
data_new=uffread.read_sets()
print(data[3]['id1'])
# You will see the id1 is 'Time for'
# but not Time for 车速

jankoslavic commented 6 months ago

I guess this needs work to be prepared as a PR. Any volunteers?

jankoslavic commented 5 months ago

@masonacezllk I have now spent some time on this issue. The problem is that by the uff/unv standard the file should be in ISCII an therefore we here: https://github.com/ladisk/pyuff/blob/ac669b9a5e92bf52a73b2c6bef1e7a68fbb187c7/pyuff/datasets/dataset_58.py#L908 encode the data back to ISCII. The non-ascii characters are lost at this step. We do support reading non-ascii characters, but not writing.

This is a broader issue and I will open a new one.

ladisk / pyuff

write_sets with id1 in Chinese get string lost #87