selik / xport

Python reader and writer for SAS XPORT data transport files.
MIT License
49 stars 24 forks source link

Add support for CPORT (compressed XPORT format) #6

Open selik opened 8 years ago

selik commented 8 years ago

It seems some archaic FDA submission rules require(d) SAS XPT or CPT-format files. The Aggregate Analysis of ClinicalTrials.gov Database hosts the same data in Oracle "dmp", pipe-delimited text, and SAS CPORT formats. Perhaps we can use these files as a sort of Rosetta stone to infer the specification of the SAS CPT/CPORT format.

dhanababum commented 7 years ago

Hi, Great Contribution. I need to know some information about CPORT, Currently my client requirement is to read .cpt format files in python. But I'm unable to find the layout format for CPORT like https://support.sas.com/techsup/technote/ts140.pdf(XPORT format). Did you found any information about CPORT format ?. Or any help needed on this ?

selik commented 7 years ago

@dhanababum I believe it stands for "Compressed export" or something like that. Unfortunately, we'd have to reverse-engineer it.

The binary CPORT format is not openly documented. The data values in files produced by PROC CPORT can be compressed and the files may be password-protected.

https://www.loc.gov/preservation/digital/formats/fdd/fdd000464.shtml#notes

smiiil commented 3 years ago

Hi, Appreciate all the work on this. I'm also running into an issue opening compressed transport files. Any luck with using Python for CPORT files?

selik commented 3 years ago

@smiiil Sorry, I haven't gotten around to it, and I don't expect to for a while. I'm happy to coach you through it, though.

smiiil commented 3 years ago

Sure, willing to help.

selik commented 3 years ago

My design idea was that the cport module could extend classes from the v56 module, trying to reuse as much of the logic as possible.

Unfortunately, there seem to be some bugs in the latest version, so maybe it's best to start by fixing those, which'd get you familiar with the logic. The decision to extend Pandas made the code much more complex. Hopefully it made the API more pleasant, but I've started to worry that it was a mistake.

cmdugan13 commented 2 years ago

Thanks for all of your work @selik! Following this thread since I also am running into the issue with CPORT files.

selik commented 2 years ago

@cmdugan13 Is the CPORT file you're trying to read publicly available?

cmdugan13 commented 2 years ago

It is-- I can't link the file, but it's C2419P1M.XPORT in the attached folder 2021 Midyear-Final-Model Software.zip It is on CMS's website, if you need the source

selik commented 2 years ago

I'll take a look next weekend / late January.

selik commented 2 years ago

This is going to be tricky. SAS Universal Viewer doesn't support CPORT files. Apparently the universe is smaller than we thought. https://support.sas.com/kb/42/356.html

lscott15 commented 2 years ago

I found some sample datasets that CMS published in 2014 that are available both as TRN and as TXT files if it helps: STDIAG.TRN STDIAG.TXT

selik commented 2 years ago

@lscott15 Thanks for the tip. I'll check it out.

thekevshow commented 1 year ago

Was there any progress made on this? I am also willing to help.