ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.12k stars 590 forks source link

feat: Support Daft dataframe as a backend #8904

Open jaychia opened 5 months ago

jaychia commented 5 months ago

Which new backend would you like to see in Ibis?

Hi! I would like to explore building a backend for Ibis for Daft (www.getdaft.io)

I am one of the maintainers of the project, and we have had some user interest in using Ibis as an interface for Daft. Daft is a distributed query engine built with a Python dataframe API, with most of its internals written in Rust.

We're not sure where to begin/how to think about potential integrations but would love some pointers. Primarily:

  1. Are there a core set of features that should be implemented first? And how much surface area does this involve?
  2. How do we then incrementally build out new features to increase support of the Ibis API in totality?

Excited for this :)

Code of Conduct

lostmygithubaccount commented 5 months ago

Hi @jaychia! We'd be happy to help with adding in a Daft backend for Ibis. Does or will Daft have a SQL interface? I'm assuming it's only the Python dataframe interface. Today, Ibis supports 3 Python dataframe backends:

  1. pandas
  2. Dask
  3. Polars

I believe the Daft API is most similar to the Polars API. Thus, I'd suggest taking a look at the Polars backend and starting there for the implementation of Daft. Generally the process will be to get the backend started -- creating a connection, implementing enough functionality to get data into it (create_table, read_parquet, etc.), and implementing basic operations. There's no minimum required per se, but it would be good to support most of the basics upon initial release (ordering, aggregations, filtering, etc.).

Ibis defines over 300 operations, many that won't be applicable to every backend. You can see current coverage here: https://ibis-project.org/support_matrix. So it's completely fine to start with a MVP for the Daft backend and increase coverage over time.

Let me know if you have any additional questions! I'd recommend essentially copying one of the existing backends (probably Polars), cutting it down, and working to get the test suite passing.

jaychia commented 5 months ago

Yes indeed - Daft is probably most similar to the Polars lazy API. We do not yet have a SQL frontend.

Would https://github.com/ibis-project/ibis/blob/main/ibis/backends/polars/tests/conftest.py be a good place to start to implement a backend?

lostmygithubaccount commented 4 months ago

@jaychia apologies for the slow response! yes, something like that -- you can take a look at the Polars implementation, get the basic tests passing, and go from there

jaychia commented 4 weeks ago

A quick update here:

The team is actively looking at building up a SQL frontend to Daft. We have basic support up already, with a more extensive roadmap detailed here: https://github.com/orgs/Eventual-Inc/projects/8/views/1

That might end up being the easiest way to integrate ibis, given that we can use SQL as the narrow waist between the ibis and Daft backends.

Let me know if that makes sense, and if that might be the better way forward?

lostmygithubaccount commented 3 weeks ago

hi @jaychia, that sounds like it would be a great option. is there a specific SQL dialect daft is targeting? if so, we could probably re-use one of the existing SQL compilers within Ibis (provided by SQLGlot)

jaychia commented 3 weeks ago

No specific dialect at the moment, we're still building out SQL support in Daft and can provide more updates as we go along.

IIUC then if we are compatible with any of SQLGlot's target dialects then we should be good to go? Am I understanding this correctly that Ibis does: dataframe syntax -> some SQL dialect --- SQLGlot ---> some target SQL dialect --> Daft dataframe query plan?

cc @universalmind303 who is working on our SQL support

gforsyth commented 3 weeks ago

IIUC then if we are compatible with any of SQLGlot's target dialects then we should be good to go

Correct.

Am I understanding this correctly that Ibis does: dataframe syntax -> some SQL dialect --- SQLGlot ---> some target SQL dialect --> Daft dataframe query plan?

Not quite, but close, it's dataframe syntax -> Ibis Internal Representation -> SQLGlot -> target SQL dialect -> Daft query plan