[Enh]: Better benchmarking routine

narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!

https://narwhals-dev.github.io/narwhals/

MIT License

625 stars 92 forks source link

[Enh]: Better benchmarking routine #805

Open FBruzzesi opened 3 months ago

FBruzzesi commented 3 months ago

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

No response

Please describe the purpose of the new feature or describe the problem to solve.

There are some features that can require extra attention or are worth benchmarking to understand if worth implementing. For example I am thinking of #500 and #743.

Suggest a solution if possible.

I checked how other libraries do that, specifically pydantic. They use codspeed which seems to have a free tier for public repos.

Question is: what to benchmark?! Would TPCH queries in main vs branch be a reasonable test?

If you have tried alternatives, please describe them below.

Currently very manual effort on kaggle

Additional information that may help us understand your needs.

No response

DeaMariaLeon commented 3 months ago

Hey, I installed Codspeed for benchmarking on pydata/sparse. I'm transforming benchmarks from asv to codspeed "as we speak".

Would TPCH queries in main vs branch be a reasonable test?

Codpeed tests the opened PR against main, if that was the question. 🤔 It can be setup to block the merge if there is a regression. It sends the report as a comment on the PR. If you need details (or help) let me know. Those are my 15 cents. 😇 edit: Benchmarks for Codspeed run in the CI. TPCH tests with a lot of data, doesn't it?

FBruzzesi commented 3 months ago

Hey Dea, thanks for the input. That's what happens when I open issues in a rush. Let me try to clarify some points and ideas.

My understanding is that one can mark some test for benchmarking, and I am wondering what could these test be.

TPCH tests with a lot of data, doesn't it?

One option is to run TPCH queries with the subset of the data we have in tests/data/ folder. It should not take as long as the actual TPCH benchmarking.

Codpeed tests the opened PR against main, if that was the question. 🤔

Yes that is exactly my point: PR (branch) vs main, so I am getting the process right 👌 I wonder though if it could be trigger only, as it is definitly an overkill for most PRs.

If you need details (or help) let me know.

I have never used it so far, I am happy to give it a spin, but expect to be pinged for help 🙈

DeaMariaLeon commented 3 months ago

I wonder though if it could be trigger only, as it is definitly an overkill for most PRs.

It doesn't say in the documentation. I guess one could do it "somehow" with the CI, but I don't think that it's an out-of-the-box option. You can select if the report is sent all the time to the PR, or only if there is a failure/improvement... but that's all they mention.

but expect to be pinged for help

I truly doubt that you'll ever need help from me 😁 .. but sure!

FBruzzesi commented 4 weeks ago

Commenting to discuss the idea: as plotly is understandably concerned about performances, maybe we could use the script they shared to assess if we have a performance drop