Closed gaineleanor closed 2 years ago
Because I am a Chinese developer, my English is poor, so there may be errors in the expression. Please forgive me.
Wow! Thanks for this effort. Coincidentally, I'm working on this right now. I'll have time to review later in November.
Hi I have submitted to add xptv8 format write function (without format and informat v9 format) section @selik
file = r'C:\Users\admin\Desktop\v8rock.v8xpt'
with open(file, 'rb') as f:
library = xport.v89.load(f)
cc = next(iter(library.values()))
print(cc)
df = pd.DataFrame({
'alphaf7we8f46we1f': [10, 20, 30],
'beta': ['x', 'y', 'z'],
'beta323fdfs': ['x', 'y', 'z'],
})
ds = xport.Dataset(df, name='888', label='mydataset')
for k, v in ds.items():
v.label = k + 'this is a label that'
library = xport.Library({'888': ds})
with open('v8rock.v8xpt', 'wb') as f:
xport.v89.dump(library, f)
I'll respond to this by the end of December.
Let's talk about supporting v8/9 in #10 .
For text encoding:
Transport files that were created by SAS releases before 9.2 are not stamped with encoding values. ... The encoding of the character data is stamped in transport files that are created using SAS versions 9.2 and later.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p0s4baszwvxumqn1p4cxbvnrpn8p.htm
It's not clear how the text encoding is "stamped" in the file. If we had an example of a SAS Transport file stamped with a text encoding, we could implement it. Without that, I don't know if there's anything better to do than assume ISO-8859-1. You can always encode/decode to recover the original characters.
In [1]: '\N{snowman}'.encode().decode('ISO-8859-1')
Out[1]: 'â\x98\x83'
Replying to: https://github.com/selik/xport/issues/10#issuecomment-1001299035
@gaineleanor I think a reasonable option would be to have a xport.encoding
variable defaulted to 'ISO-8859-1'
to stay consistent with current behavior. All the encode/decode could look up xport.encoding
at runtime. Or, I could add encoding='ISO-8859-1'
to every method. That might be "better" design, but a bigger change.
We want to allow CP-1252 specifically. #30
the same issue (chinese chars)
@dtboy1995 Are you trying to write or read XPT format with Chinese characters?
@selik thanks for you reply. i am try to write Chinese characters to v56. it throw errors
import pandas as pd
import xport
import xport.v56
df1 = pd.read_csv('input.csv')
df2 = pd.read_csv('input.csv')
ds1 = xport.Dataset(df1, name='SPEC1', sas_os='X64_DS12', sas_version='9.4')
ds2 = xport.Dataset(df2, name='SPEC2', sas_os='X64_DS12', sas_version='9.4')
library = xport.Library({'SPEC1': ds1, 'SPEC2': ds2})
with open('output.xpt', 'wb') as f:
xport.v56.dump(library, f)
print("done")
@dtboy1995 编码不支持.
with xport.v56._encoding(data='utf-8', metadata='Windows-1252'):
bytestring = xport.v56.dumps(library)
with xport.v56._encoding(data='utf-8', metadata='Windows-1252'):
library = xport.v56.loads(bytestring)
89因此实现了“测试版”功能。检查一下并请提供反馈。
with xport.v56._encoding(data='utf-8', metadata='Windows-1252'): bytestring = xport.v56.dumps(library) with xport.v56._encoding(data='utf-8', metadata='Windows-1252'): library = xport.v56.loads(bytestring)
File "/usr/local/lib/python3.10/site-packages/xport/v56.py", line 1008, in dumps
return bytes(Library(library))
File "/usr/local/lib/python3.10/site-packages/xport/v56.py", line 757, in bytes
return self._bytes()
File "/usr/local/lib/python3.10/site-packages/xport/v56.py", line 765, in _bytes
b'members': b''.join(bytes(Member(member)) for member in self.values()),
File "/usr/local/lib/python3.10/site-packages/xport/v56.py", line 765, in
89 implements a "beta" feature for this. Check it out and please give feedback.
with xport.v56._encoding(data='utf-8', metadata='Windows-1252'): bytestring = xport.v56.dumps(library) with xport.v56._encoding(data='utf-8', metadata='Windows-1252'): library = xport.v56.loads(bytestring)
It worked for me, Thank you!!!
Hello selik Can you consider adding coding adaptation options? Under normal circumstances, it is mainly variable names, variable labels, and character values may be other encoding I submitted the code for the reading part of the v8 format. I want to try to add encoding options. I have no choice but to get to the map function
members=map(MemberV8.from_bytes, chunks),
. Please help to see if there is any good way. I will update the code written in v8 format later when I have time. Please help review. thanks