Closed alippai closed 3 years ago
cc @ritchie46
Recently I was running groupby2014 branch which runs only 3 solutions as of now. Will schedule run of all solutions today. Thanks for prompt.
That explains. Thanks for this useful benchmark, it's a really great ecosystem overview.
I just run it now, multiple solutions got new releases recently thus it will take couple of days to finish. Will close this issue when done.
...
Benchmark solutions to run: data.table, pydatatable, dplyr, dask, juliadf, polars
...
Resolved, report updated.
Polars is a fairly low quality package that seems more of a perpetual alpha release. I say this because it has numerous serious bugs and multiple segfaults. It even corrupts data. IMO it should be excluded from the benchmark altogether.
@impredicative it would be a funny list if we'd drop packages because they have serious bugs. Dask, pandas, cudf, DataFrames.jl all already burned me in production. If you want to help the open source community, create a similar project to db-benchmark for testing conformance to your expected results using your data and queries. It'd be undoubtedly useful. @jangorecki is already doing an epic work maintaining this repo, it's definitely out of the scope of this package to track quality as well (and other OLAP products are not added yet, like TiDB, Vertica, MemSQL). Spamming the issue queues in the related projects is unwanted and doesn't create value, stop it, please.
@impredicative it would make more sense if you would link those bug reports (in a new dedicated issue). If a project is not maintained, bugs are not being resolved for extended period of time, then we could eventually think about dropping a solution from benchmark. According to my experience with solutions in benchmark there would be multiple other solutions that would be better candidates to be dropped.
For context, @impredicative was blocked on the Polars repo for being really rude and complaining that I wouldn't implement a feature or at least not the way he likes to see it.
Constructive feedback is of course more than welcome and if there are any issues/ bugs that need to be resolved I happily do so in discussion with the users. I am afraid that the request above is more due to his relationship with me than real bugs/ segfaults. But if there are any, please let me know. :)
All packages have bugs, but only polars closes them without a fix, then blocks the reporter for calling out that it was closed without a fix. As advised by @jangorecki, I will follow up independently in a new issue.
It's stuck on version 0.4.5. I looked into the code, but I don't see any constraints, a new pip install should result in new version. The report claims:
so this might indicate some cache or deploy issues.