Open dseynaev opened 1 year ago
deltalake.read_table() ->
deltalake_tbl.to_pyarrow() ->
polars.from_arrow() ->
polars_table
outer py-function
to_pyarrow py
delta-rs has first class support from python.
Hi @wjones127 can I ask, do you think it is realistic to make a minimal data-lake reader for r-polars via delta-rs rust-api and arrow2 ? Or is there some filesystem magic from python which is also needed?
I don't think filesystems are a blocker there; you can use the object stores that come with delta-rs.
But, especially if you are using arrow2, there's no ready-to-use scan function in delta-rs that you could plug into, so there's quite a bit of code you would have to read. Currently in the python package, delta-rs provides the file list and their statistics, and then the Python package provides the actual file scanners through PyArrow. Eventually, we'll have the scanner available in delta-rs and then it will be a lot easier to implement the R package, but that will take time.
@sorhawell @wjones127 myself and @Ploppz might have some capacity to investigate/contribute but will need some pointers/guidance
would it be helpful to connect over Discord?
@dseynaev sure :) what discord channel do you prefer? it could be the r-polars subchannel of polars discord
One stepping stone would be an interface for r-arrow dataset, then r-polars must a make a scanner-adaptor to that. It will take a week or two for me to write I think, but very parallel to the py-polars/py-arrow interface. Then would be to good reasons to go ahead with #165
Waiting for pola-rs/polars#17244
polars seems to support it but it's implemented on the python side: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_delta.html
the underlying delta lake interface lib is written in Rust though: https://docs.rs/deltalake/latest/deltalake/