pola-rs / polars-benchmark

Apache License 2.0
75 stars 41 forks source link

Update pandas queries to better match original SQL queries #96

Closed stinodego closed 7 months ago

stinodego commented 7 months ago

There are some hand optimizations that should not be in there (e.g. filters before joins).

ritchie46 commented 7 months ago

@MarcoGorelli could you pherhaps do a pass of the pandas queries (might be a copy/past from narwhals)? I think that pre-computing the group-by aggregations elementwise is okish (with a comment), but filters and projections should definitely be left to optimizer.

MarcoGorelli commented 7 months ago

Thanks for the ping, I'll run these to see. Ideally, "pandas" vs "pandas via narwhals" should be very close

For q1, that's now the case, the query looks fine

For q2, they're way off, and looking at the code, the pandas query is filtering before joining. I'll update this and the others where there's a perf difference