Perhaps we should just execute this predicate with DuckDB.
Pros of switching:
No need to maintain this interpreter!
Possibly better performance (this needs to be validated)
Cons of switching:
It might be better to maintain this interpreter if eventually we want Quokka to generate SIMD code in Gandiva fashion. Then the predicate can just be compiled down into a shared object library that can be loaded at runtime.
====== Aggregations and groupbys ======
Currently Quokka uses Apache Arrow to do aggregations and groupbys.
Perhaps we should also just use DuckDB.
======= Executor kernels ===========
Quokka kernels today almost exclusively use Polars. Some can probably be switched to DuckDB.
Pros of switching:
Possibly better out-of-core support
Cons of switching:
Want to wait until Arrow 10.0 with the super out-of-core fast hash join support.
DuckDB can be used in many places in Quokka, mostly replacing Polars. This can be approached in stages.
===== SQL predicates =======
Currently Quokka maintains an interpreter that executes SQL predicates with Polars or Pandas (https://github.com/marsupialtail/quokka/blob/master/pyquokka/sql_utils.py#L19)
Perhaps we should just execute this predicate with DuckDB.
Pros of switching:
Cons of switching:
====== Aggregations and groupbys ======
Currently Quokka uses Apache Arrow to do aggregations and groupbys.
Perhaps we should also just use DuckDB.
======= Executor kernels ===========
Quokka kernels today almost exclusively use Polars. Some can probably be switched to DuckDB.
Pros of switching:
Cons of switching: