tk3369 / SASLib.jl

Julia library for reading SAS7BDAT data sets
Other
34 stars 7 forks source link

Missing data from first META page #53

Open tk3369 opened 5 years ago

tk3369 commented 5 years ago

It seems that data residing the in the first META page is missing. I guess it might be introduced in the last major refactoring.

Examples:

data_pandas/test2.sas7bdat

julia> readsas("data_pandas/test2.sas7bdat")
Read data_pandas/test2.sas7bdat with size 10 x 100 in 0.00088 seconds
SASLib.ResultSet (10 rows x 100 columns)
Columns 1:Column1, 2:Column2, 3:Column3, 4:Column4, 5:Column5, 6:Column6, 7:Column7, 8:Column8, 9:Column9, 10:Column10 …
1: 0.0, , 0.0, 1960-01-01, 0.0, , 0.0, 0.0, 0.0, 
2: 0.0, , 0.0, 1960-01-01, 0.0, , 0.0, 0.0, 0.0, 
3: 0.0, , 0.0, 1960-01-01, 0.0, , 0.0, 0.0, 0.0, 
4: 0.0, , 0.0, 1960-01-01, 0.0, , 0.0, 0.0, 0.0, 
5: 0.0, , 0.0, 1960-01-01, 0.0, , 0.0, 0.0, 0.0, 

data_AHS2013/omov.sas7bdat

The first 103 records are missing as compared with results from ReadStat.

pmbaumgartner commented 3 years ago

I've encountered this bug recently. I have a dataset (that I unfortunately can't share) where it skips the first 48 rows. What ends up happening is it concatenates these "empty" rows at the bottom of the dataset - e.g. I see something like the above with 0.0 or blank values.

xiaodaigh commented 3 years ago

Can u create a synthetic data and try to replicate the issue? Like similar missing but random values. Then we can see how it works.

On Fri, 8 Jan 2021, 05:09 Peter Baumgartner, notifications@github.com wrote:

I've encountered this bug recently. I have a dataset (that I unfortunately can't share) where it skips the first 48 rows. What ends up happening is it concatenates these "empty" rows at the bottom of the dataset - e.g. I see something like the above with 0.0 or blank values.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tk3369/SASLib.jl/issues/53#issuecomment-756285599, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCJ6JLKBLOO7KONZFOJ2LLSYX2HFANCNFSM4HNZUF7Q .

pmbaumgartner commented 3 years ago

I'll try and generate something that replicates this. I think it has something to do with the size of the dataset: I've got 1800 columns and that seems to upset whatever I throw at this.

xiaodaigh commented 3 years ago

've got 1800 columns

If you can generate a synthetic one that fails I can log the file here too for other to test https://github.com/xiaodaigh/sas7bdat-resources

The hardest thing about SAS is to get sample files.