Open ivirshup opened 1 year ago
BTW pandas 2.0 will have a pyarrow backend... I wonder how that will work for bioframe.
BTW pandas 2.0 will have a pyarrow backend
Yup, I've already opened issues around the release candidate😅. Not actually that sure how much the current pyarrow backend is changing, or if it's just not experimental anymore.
But, while pyarrow will probably have better performance than pandas (especially with strings), I think backends like duckdb or polars have the much larger benefit of being able to work with out-of-core data efficiently.
I am collaborating with the bioframe authors on this project (not in a usable state yet): https://github.com/endrebak/poranges
Related to this a request for input on defining a dataframe standard: https://data-apis.org/blog/dataframe_standard_rfc/
Hey all,
I was wondering if you had considered supporting alternative dataframe classes in this library? In particular I was thinking about the lazy/ accelerated ones built on arrow (e.g. polars, datafusion).
I would hope that the current API could be amenable to this by
@singledispatch
ing functions to different backends. It could also be nice to take advantage of a backend that was able to make work with out-of-core amounts of data and do optimizations based column order.I've also been having a good time interacting with annotation resources via
ibis
which could integrate nicely with this kind of approach.