vnmabus / rdata

Reader of R datasets in .rda format, in Python
https://rdata.readthedocs.io
MIT License
45 stars 2 forks source link

New features: faster xdr reader, ascii file reader, basic writers #31

Closed trossi closed 3 months ago

trossi commented 10 months ago

Hi @vnmabus! This package has been greatly useful and I have implemented some new features (faster reading of arrays, a reader for ascii files, and xdr and ascii writers for basic types). Would such features fit to your plans for the rdata package? I could submit them in a few separate PRs to make the review smoother.

vnmabus commented 10 months ago

They definitely fit. In fact, I mentioned in https://github.com/vnmabus/rdata/issues/13 the need for faster reading of arrays.

Furthermore, we need to move away from xdrlib, as it will be removed in Python 3.13. I want to discuss this topic with the reviewers of PyOpenSci (https://github.com/pyOpenSci/software-submission/issues/144), @rich-iannone, @has2k1 and @isabelizimm because I do not know if there is a semi-official successor package, or if it is better to vendor it. As I understand, your proposal is to add the XDR reader/writer to this project, instead of depending on an external parser, am I right?

As for the ASCII format, I guess that the implementation is to just add a parser subclass and overwrite the reading methods therein. At least that is what I thought that could be done (but nobody asked for that feature so I did not bother implementing it). I will definitively add it if you create a PR.

I have more doubts about the writing methods, as currently this package only reads and does not write XDR (but again, I am not opposed to it, it is just that I did not need it personally and it is a ton of work). I guess that you just added the methods for writing XDR basic types and arrays, which is fine.

trossi commented 10 months ago

They definitely fit. In fact, I mentioned in https://github.com/vnmabus/rdata/issues/13 the need for faster reading of arrays.

Great!

Furthermore, we need to move away from xdrlib, as it will be removed in Python 3.13. I want to discuss this topic with the reviewers of PyOpenSci (https://github.com/pyOpenSci/software-submission/issues/144), @rich-iannone, @has2k1 and @isabelizimm because I do not know if there is a semi-official successor package, or if it is better to vendor it. As I understand, your proposal is to add the XDR reader/writer to this project, instead of depending on an external parser, am I right?

Yes, I did reading directly with numpy and a positive side effect is indeed that deprecated xdrlib isn't needed anymore.

As for the ASCII format, I guess that the implementation is to just add a parser subclass and overwrite the reading methods therein. At least that is what I thought that could be done (but nobody asked for that feature so I did not bother implementing it). I will definitively add it if you create a PR.

That's right, exactly so. ASCII format is quite handy for debugging and testing, but not really that useful for production use.

I have more doubts about the writing methods, as currently this package only reads and does not write XDR (but again, I am not opposed to it, it is just that I did not need it personally and it is a ton of work). I guess that you just added the methods for writing XDR basic types and arrays, which is fine.

Yes, I implemented xdr/ascii writer classes analogous to parsers, and conversion functions for lists, dicts, and numpy arrays to R-like format. I agree that covering all types would be a lot of work and the probably there isn't unique mapping between Python and R types either. I basically focused only to types that I needed for a project.