Open louisbrulenaudet opened 8 months ago
Two solutions can be envisaged: either
Dask
support withinPolars
, orSAS
support to guaranteePolars
' autonomous operation
There's another solution; Arrow export from the existing SAS libraries - with that in place we could simply zero-copy the output into Polars without having to write an entire (complicated) SAS-parsing i/o stack (which I suspect there is little appetite for). Could be worth adding an Issue to the various projects, requesting efficient Arrow export 😉 Otherwise some intermediate conversions are likely the way to go for now...
Out of curiosity, what are the major domains that use these files? I've never come across them in finance; are they somewhat domain-specific?
SAS files are integral within the health sector, especially while dealing with health authorities and regulators. SAS facilitates regulatory compliance, thereby it's a common choice among health professionals. Polars support would be very much appreciated.
Description
Dear developers,
As a proprietary language used at scale, it would be beneficial to introduce support for reading
SAS
backup files (.sas7bdat
), so as not to have to use third-party libraries to perform a time-consuming and sub-optimal series of conversions.Today, it is possible to proceed by using
Dask
to parralelize reading usingpyreadstat
, but it will then be necessary to convert theDask
DataFrame toPandas
, in order to convert thePandas
DataFrame toPolars
, and conversion fromDask
toPandas
is relatively slow and cumbersome in a production environment.Two solutions can be envisaged: either
Dask
support withinPolars
, orSAS
support to guaranteePolars
' autonomous operation. Also, integrating progress bar support would be very useful, especially in view of the fact that.sas7bdat
files are generally used for tables containing more than 1000 columns.Best regards, Louis