Open marsupialtail opened 1 year ago
That would be great!
@ritchie46 I have started but ran into a problem. Here is how I wrote query 13:
ref_customer = polars.read_csv("/home/ziheng/tpc-h/customer.tbl", sep="|")
ref_orders = polars.read_csv("/home/ziheng/tpc-h/orders.tbl", sep="|").\
filter( ~(polars.col("o_comment").str.contains('special') & polars.col("o_comment").str.contains('requests')))
ref = ref_customer.join(ref_orders, left_on="c_custkey", right_on="o_custkey", how="left")\
.with_column(polars.col("o_orderkey").is_not_null().alias("o_orderkey_1")).groupby("c_custkey").agg([polars.col("o_orderkey_1").sum()])\
.groupby("o_orderkey_1").count().sort('count',reverse = True)
#.sort('o_orderkey_1',reverse = True)
However this give wrong results. Any suggestions?
NVM i know what the problem is. I need to make sure "special" comes before "requests". Have to use regex.....
Implementation for Pandas for 22 queries: https://gist.github.com/UranusSeven/55817bf0f304cc24f5eb63b2f1c3e2cd
Polars / pyspark / DuckDB have full query coverage. We should still include the pandas queries. Perhaps the link above could help.
Polars can run them for sure. Do you want a contribution?