modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.76k stars 651 forks source link

Enable read_sas #1983

Open Speccles96 opened 4 years ago

Speccles96 commented 4 years ago

Was trying to use read_sas with modin and received this message:

UserWarning: `read_sas` defaulting to pandas implementation.
To request implementation, send an email to feature_requests@modin.org.

There are no other python packages that allow you to parallelize read_sas with out some difficult work around. Would be a nice feature to add for the community and for the folks that use .sas7bdat files regularly.

devin-petersohn commented 4 years ago

Hi @Speccles96, thanks for posting! I agree this would be a really good feature for the community.

I will flag as Help Wanted until we get the chance to implement it ourselves. I/O is fairly abstract in Modin so familiarity with the codebase is probably necessary to implement this.

jbrockmendel commented 1 year ago

Looking into the pandas sas code, it looks like we'd need to vendor and adapt a bunch of it. Effectively we'd need to add a skip_rows-like arg to the low-level reader. Might be worth trying to get that as a feature in pandas so we don't have to maintain the extra code here.