Open wjones127 opened 8 months ago
Additionally, polars has a js library https://www.npmjs.com/package/nodejs-polars. It would be cool to add that same level of support to the lance js bindings.
That would be cool indeed. Related issue: https://github.com/lancedb/lancedb/issues/153
Hi @wjones127, I would like to give a try at the tasks mentioned in this issue. Could you assign me to this task?
Several initial questions:
to_polars()
include polars as an optional dependency? DataFrame
or a LazyFrame
as the return type of to_polars
?Also looking for suggestions to start on the tasks. Thanks a lot!
Could you assign me to this task?
Sure, done.
Would to_polars() include polars as an optional dependency?
Yes. We'd like to make sure we don't need to import it until necessary. Related: #1217
Do we want a DataFrame or a LazyFrame as the return type of to_polars?
Our other APIs are eager right now, so I'd say DataFrame
. But we could later add a to_polars_lazy()
that returns a LazyFrame
if we wanted, but I think getting the pushdown and such correct would take some work that we should defer for later.
I will do more research on how polars handle projection and predicate pushdown in their lazy API, but does this feature requires anything to be done on the polars side?
We might already be able to work somewhat via pyarrow Dataset API. Part of that implementation is here: https://github.com/pola-rs/polars/blob/64bd3455f0d837f888f2d967cc545e2444f844a8/py-polars/polars/io/pyarrow_dataset/anonymous_scan.py#L14
Is this already done? https://blog.lancedb.com/lancedb-polars-2d5eb32a8aa3/
to_polars()
method that forwards arguments toto_table()