h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
325 stars 88 forks source link

Add Vaex comparison #180

Open maccam912 opened 3 years ago

maccam912 commented 3 years ago

Just found this, thanks for this repo! I see others have offered new projects that could be included in the comparison. Might I suggest Vaex as a future contender?

https://vaex.readthedocs.io/en/latest/

It's lazy, so I'm not sure what magic needs to be done to get actual times for group-by and join. Presumably whatever Spark is doing now that I think about it.

I will now attempt to figure out how to add a tag to this issue.

jangorecki commented 3 years ago

Thank you for filling out this request. Spark does provide a method for materializing computation. As long as Vaex provides similar one there shouldn't be any issues because of laziness. Note that author of Vaex @maartenbreddels was already reproducing this benchmark, and was planning to make PR, as mentioned in #150.

ptomecek commented 3 years ago

@jangorecki @maartenbreddels Looks like #150 is closed, was there ever a PR to include Vax in the benchmarks? I don't see it in the latest generated report.

jangorecki commented 3 years ago

no

maartenbreddels commented 3 years ago

This is planned 🙂

FullyWashable commented 1 year ago

Hi, I'm looking at this and at the linked pull request, and the reason why the vaex comparison still isn't included is a bit over my head. Is that comparison available somewhere else? Or a similar one that does include vaex?