selik / xport

Python reader and writer for SAS XPORT data transport files.
MIT License
49 stars 24 forks source link

SAS V8 Support #110

Open jehugaleahsa opened 9 months ago

jehugaleahsa commented 9 months ago

Hello:

I am a software developer in the clinical industry. A few years ago, I dug into the SAS V8 documentation and got my company's Java implementation to support V5/V8/V9. Here was the documentation I used: https://support.sas.com/content/dam/SAS/support/en/technical-papers/record-layout-of-a-sas-version-8-or-9-data-set-in-sas-transport-format.pdf. To be honest, that document is pretty horrible from an implementor's point of view, but I managed to figure it out.

The motivation was supporting Asian regulatory agencies, like the PMDA and NMPA, where UTF-8 characters can be 4 bytes each, meaning variables names in V5 could only be two characters. 😬

At the moment, there is an organization called CDISC that is pushing to move folks away SAS datasets for submissions to the FDA. This is their proposal: https://www.cdisc.org/dataset-json. I am trying to come up with a more general-purpose specification, but I need to provide scripts that can run in SAS to convert datasets to my preferred format. I am able to use pandas for reading the data itself (although it requires everything sitting in memory at once). I'd like to use your library for reading metadata, like dataset names and labels, and variable names, labels, types, and formats. Eventually, if memory constraints become an issue, your library will give me more control over how I process the file.

Is getting V8/V9 (extended names, labels, and formats) something you'd be willing to investigate? I bet I could figure out how to apply my Java changes to your Python code, but I'd need to really dig in. Let me know.

shenzj1994 commented 7 months ago

I might be wrong but seems like this project already supported v8/v9. You can try:

import xport.v89
with open('example.xpt', 'rb') as f:
    library = xport.v89.load(f) 

instead of import xport.v56