marsupialtail / quokka

Making data lake work for time series
https://marsupialtail.github.io/quokka/
Apache License 2.0
1.1k stars 60 forks source link

Explore using DuckDB as computation engine #8

Closed marsupialtail closed 1 year ago

marsupialtail commented 1 year ago

DuckDB can be used in many places in Quokka, mostly replacing Polars. This can be approached in stages.

===== SQL predicates =======

Currently Quokka maintains an interpreter that executes SQL predicates with Polars or Pandas (https://github.com/marsupialtail/quokka/blob/master/pyquokka/sql_utils.py#L19)

Perhaps we should just execute this predicate with DuckDB.

Pros of switching:

Cons of switching:

====== Aggregations and groupbys ======

Currently Quokka uses Apache Arrow to do aggregations and groupbys.

Perhaps we should also just use DuckDB.

======= Executor kernels ===========

Quokka kernels today almost exclusively use Polars. Some can probably be switched to DuckDB.

Pros of switching:

Cons of switching: