Open jaychia opened 5 months ago
Hi @jaychia! We'd be happy to help with adding in a Daft backend for Ibis. Does or will Daft have a SQL interface? I'm assuming it's only the Python dataframe interface. Today, Ibis supports 3 Python dataframe backends:
I believe the Daft API is most similar to the Polars API. Thus, I'd suggest taking a look at the Polars backend and starting there for the implementation of Daft. Generally the process will be to get the backend started -- creating a connection, implementing enough functionality to get data into it (create_table
, read_parquet
, etc.), and implementing basic operations. There's no minimum required per se, but it would be good to support most of the basics upon initial release (ordering, aggregations, filtering, etc.).
Ibis defines over 300 operations, many that won't be applicable to every backend. You can see current coverage here: https://ibis-project.org/support_matrix. So it's completely fine to start with a MVP for the Daft backend and increase coverage over time.
Let me know if you have any additional questions! I'd recommend essentially copying one of the existing backends (probably Polars), cutting it down, and working to get the test suite passing.
Yes indeed - Daft is probably most similar to the Polars lazy API. We do not yet have a SQL frontend.
Would https://github.com/ibis-project/ibis/blob/main/ibis/backends/polars/tests/conftest.py be a good place to start to implement a backend?
@jaychia apologies for the slow response! yes, something like that -- you can take a look at the Polars implementation, get the basic tests passing, and go from there
A quick update here:
The team is actively looking at building up a SQL frontend to Daft. We have basic support up already, with a more extensive roadmap detailed here: https://github.com/orgs/Eventual-Inc/projects/8/views/1
That might end up being the easiest way to integrate ibis, given that we can use SQL as the narrow waist between the ibis and Daft backends.
Let me know if that makes sense, and if that might be the better way forward?
hi @jaychia, that sounds like it would be a great option. is there a specific SQL dialect daft is targeting? if so, we could probably re-use one of the existing SQL compilers within Ibis (provided by SQLGlot)
No specific dialect at the moment, we're still building out SQL support in Daft and can provide more updates as we go along.
IIUC then if we are compatible with any of SQLGlot's target dialects then we should be good to go? Am I understanding this correctly that Ibis does: dataframe syntax -> some SQL dialect --- SQLGlot ---> some target SQL dialect --> Daft dataframe query plan
?
cc @universalmind303 who is working on our SQL support
IIUC then if we are compatible with any of SQLGlot's target dialects then we should be good to go
Correct.
Am I understanding this correctly that Ibis does:
dataframe syntax -> some SQL dialect --- SQLGlot ---> some target SQL dialect --> Daft dataframe query plan
?
Not quite, but close, it's dataframe syntax -> Ibis Internal Representation -> SQLGlot -> target SQL dialect -> Daft query plan
Which new backend would you like to see in Ibis?
Hi! I would like to explore building a backend for Ibis for Daft (www.getdaft.io)
I am one of the maintainers of the project, and we have had some user interest in using Ibis as an interface for Daft. Daft is a distributed query engine built with a Python dataframe API, with most of its internals written in Rust.
We're not sure where to begin/how to think about potential integrations but would love some pointers. Primarily:
Excited for this :)
Code of Conduct