To showcase the power of embedded DBs, it makes sense to attempt the entire process with pandas as well as polars. There were numerous issues with reproducibility between polars and duckdb, but this was reconciled in this PR by adding pandas, and carefully debugging each intermediate DataFrame/table in the process.
Accomplishments
[x] Ensure polars and duckdb produce the same results with minimal changes to the intermediate functions
[x] Add pandas and ensure that its results are in line with both polars and duckdb
To do
[ ] Add pytest-benchmark to ensure that we are appropriately benchmarking the results for comparison
Goal
To showcase the power of embedded DBs, it makes sense to attempt the entire process with pandas as well as polars. There were numerous issues with reproducibility between polars and duckdb, but this was reconciled in this PR by adding pandas, and carefully debugging each intermediate DataFrame/table in the process.
Accomplishments
polars
andduckdb
produce the same results with minimal changes to the intermediate functionspandas
and ensure that its results are in line with bothpolars
andduckdb
To do
pytest-benchmark
to ensure that we are appropriately benchmarking the results for comparison