h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
325 stars 88 forks source link

Polars is not updated #186

Closed alippai closed 3 years ago

alippai commented 3 years ago

It's stuck on version 0.4.5. I looked into the code, but I don't see any constraints, a new pip install should result in new version. The report claims:

Benchmark run took around 148.3 hours.
Report was generated on: 2021-02-02 01:18:00 PST.

so this might indicate some cache or deploy issues.

alippai commented 3 years ago

cc @ritchie46

jangorecki commented 3 years ago

Recently I was running groupby2014 branch which runs only 3 solutions as of now. Will schedule run of all solutions today. Thanks for prompt.

alippai commented 3 years ago

That explains. Thanks for this useful benchmark, it's a really great ecosystem overview.

jangorecki commented 3 years ago

I just run it now, multiple solutions got new releases recently thus it will take couple of days to finish. Will close this issue when done.

...
Benchmark solutions to run: data.table, pydatatable, dplyr, dask, juliadf, polars
...
jangorecki commented 3 years ago

Resolved, report updated.

impredicative commented 3 years ago

Polars is a fairly low quality package that seems more of a perpetual alpha release. I say this because it has numerous serious bugs and multiple segfaults. It even corrupts data. IMO it should be excluded from the benchmark altogether.

alippai commented 3 years ago

@impredicative it would be a funny list if we'd drop packages because they have serious bugs. Dask, pandas, cudf, DataFrames.jl all already burned me in production. If you want to help the open source community, create a similar project to db-benchmark for testing conformance to your expected results using your data and queries. It'd be undoubtedly useful. @jangorecki is already doing an epic work maintaining this repo, it's definitely out of the scope of this package to track quality as well (and other OLAP products are not added yet, like TiDB, Vertica, MemSQL). Spamming the issue queues in the related projects is unwanted and doesn't create value, stop it, please.

jangorecki commented 3 years ago

@impredicative it would make more sense if you would link those bug reports (in a new dedicated issue). If a project is not maintained, bugs are not being resolved for extended period of time, then we could eventually think about dropping a solution from benchmark. According to my experience with solutions in benchmark there would be multiple other solutions that would be better candidates to be dropped.

ritchie46 commented 3 years ago

For context, @impredicative was blocked on the Polars repo for being really rude and complaining that I wouldn't implement a feature or at least not the way he likes to see it.

Constructive feedback is of course more than welcome and if there are any issues/ bugs that need to be resolved I happily do so in discussion with the users. I am afraid that the request above is more due to his relationship with me than real bugs/ segfaults. But if there are any, please let me know. :)

impredicative commented 3 years ago

All packages have bugs, but only polars closes them without a fix, then blocks the reporter for calling out that it was closed without a fix. As advised by @jangorecki, I will follow up independently in a new issue.