prrao87 / duckdb-study

Compare DuckDB, Polars and Pandas for generating an artificial dataset of persons and companies
MIT License
26 stars 1 forks source link

use polars lazy api #5

Open ritchie46 opened 1 year ago

ritchie46 commented 1 year ago

I see you are running eager polars. E.g. using read_parquet. To do a fair comparison you must scan_parquet, write your query lazily and collect at the end.

Now duckdb can use query optimization, whilst polars cannot because you force every operation to execute immediately. That's not an apples vs apples comparison.

prrao87 commented 1 year ago

Thanks @ritchie46, that's a good point and I was questioning myself about this just yesterday! 😀

I'll switch to the lazy API and rerun.