tk3369 / SASLib.jl

Julia library for reading SAS7BDAT data sets
Other
34 stars 7 forks source link

Problem reading file due to BoundsError in _process_columnname_subheader #44

Closed tk3369 closed 6 years ago

tk3369 commented 6 years ago

Error when reading this file https://github.com/ppham27/sas_to_csv/blob/master/test_files/flightdelays.sas7bdat

julia> readsas("E:/ppham27/ flightdelays.sas7bdat")
Warning: Unknown file encoding value (0), defaulting to UTF-8
ERROR: BoundsError: attempt to access 1-element Array{Array{UInt8,1},1} at index [2]
Stacktrace:
 [1] _process_columnname_subheader( ::SASLib.Handler, ::Int64, ::Int64) at C:\Users\Folio\.julia\v0.6\ SASLib\src\SASLib.jl:744
 [2] _process_subheader(::SASLib. Handler, ::Int64, ::SASLib.SubHeaderPointer) at C:\Users\Folio\.julia\v0.6\ SASLib\src\SASLib.jl:582
 [3] _process_page_metadata(:: SASLib.Handler) at C:\Users\Folio\.julia\v0.6\ SASLib\src\SASLib.jl:476
 [4] _process_page_meta(::SASLib. Handler) at C:\Users\Folio\.julia\v0.6\ SASLib\src\SASLib.jl:427
 [5] _parse_metadata(::SASLib. Handler) at C:\Users\Folio\.julia\v0.6\ SASLib\src\SASLib.jl:400
 [6] _open(::SASLib.ReaderConfig) at C:\Users\Folio\.julia\v0.6\ SASLib\src\SASLib.jl:35
 [7] #readsas#16(::String, ::Bool, ::Array{Any,1}, ::Array{Any,1}, ::Dict{Any,Any}, ::Dict{Any,Any}, ::Dict{Symbol,Type}, ::Int64, ::SASLib.#readsas, ::String) at C:\Users\Folio\.julia\v0.6\ SASLib\src\SASLib.jl:162
 [8] readsas(::String) at C:\Users\Folio\.julia\v0.6\ SASLib\src\SASLib.jl:160
tk3369 commented 6 years ago

The issue is that these files have metadata split into the pages and they are not necessary at the top of the file. The current code assumes that all meta info appears at the top before any data appears, which isn't the case here.

Layout for supervisors.sas7bdat:

  1. MIX (which contains meta info, data)
  2. AMD (which contains the rest of meta info)

Layout for flightdelays.sas7bdat:

  1. MIX (which contains meta info, data)
  2. DATA (10 pages)
  3. AMD (which contains the rest of meta info)
tk3369 commented 6 years ago

Fixed in v0.6.2

xiaodaigh commented 6 years ago

Not sure if one should worry about this and create the issue in #50 because if I tried to open the file in SAS it says

The open data operation failed. The following error occurred [Error] File xxxxx.......DATA is not a SAS data set.

I assume the data was written by something like haven?