tk3369 / SASLib.jl

Julia library for reading SAS7BDAT data sets
Other
34 stars 7 forks source link

SASLib produces different (and sometimes obviously incorrect) output vs. readstat/ReadStat.jl #80

Open kleinschmidt opened 2 years ago

kleinschmidt commented 2 years ago

I'm sorry for a vague bug report, but I can't share the data files we're dealing with here since they're confidential.

But on both the sas7bdat files I've tried SASLib.jl with, the output is either obviously wrong or diverges from what readstat (with CSV export + CSV.jl) and ReadStat.jl (which uses the C API of readstat directly). By "obviously wrong" I mean that SASLib produces a table of the correct schema (types and column names) but with all 0.0/"" values. For the other one, the structure again appears to be correct, but some values are incorrect (e.g., a bunch of numeric values are different; the strings appear to be okay).

Again, sorry I can't share any more details about this but I'd be willing to do some debugging if you have suggestions about where to start!

tk3369 commented 2 years ago

Might be related to #53. Unfortunately I have been really busy lately. If you can fix this bug then you'll be my hero :-)

tk3369 commented 2 years ago

On another note, I had also seen issue when the SAS file is compressed. If you also control the upstream data pipeline then you can play around different compression options.

tk3369 commented 2 years ago

It would be helpful if you can take a dataset, mask away data with random values, and share that file for testing. Also, smaller file is easier to work with as long as you can replicate the same problem with a smaller file.