Open petr-tik opened 5 years ago
That would be awesome of course. Do you want to take over that?
I have had some bad performance regression due to the compiler in the past. Not some obvious jemalloc related thing but a change in inlining that had a catastrophic impact. It would be great to spot those rapidly.
Happy to help.
Can we flesh out the design before I start.
Couple of questions:
I am concerned about benchmarking on TravisCI, because it's an uncontrolled environment full of kernel abstractions, which is bound to make results unreliable. The author of criterion.rs says it affects results.
The best way is to dedicate a server, give travisCI ssh access to it and run benchmarks there. Server can be either virtual or real. VPS like DigitalOcean, Linode, OVH are still abstractions, but hopefully more consistent. This needs to be tested.
If you or anyone else has a home server, they don't mind dedicating to this would be great. We can control the environment and benchmark in consistent conditions. Still requires ssh access from travisCI, which requires trust between people, who are working on tantivy.
Found packet.net, a bare metal server provider with an API and advertised support for open source. It depends on our requirements, pricing and availability of other options. We might not need them, if we decide to benchmark less often than CI. Or packet might be too expensive.
If/when we clarify the points above, I will be happy to start relevant work and potential conversations with providers.
Can you please clarify the points above?
Continuous benchmarking
Add a CI-like job to run the benchmark automatically.
It will help developers, potential users and tantivy-curious people to track performance numbers continuously. Automating also means less stress and hassle for the maintainers/developers of tantivy.
Granularity
We can choose to either run a benchmark on every commit or on every release.
On every commit
Integrate benchmarking suite into CI on the main tantivy repo. Using travisCI's
after_success
build stage, run the benchmark, append results to results.json on search-benchmark repo.Pros:
Commit-specific perf numbers - easier to triage perf regressions. Will create a more detailed picture of the hot path for the future. Automated - don't have to fiddle, re-run benchmarking locally.
Costs/cons:
Too much noise - some commits are WIP or harm perf for the sake of a refactor. Is it really necessary to keep that data? Makes every CI job run longer. Benchmarking should be done on a dedicated machine to guarantee similar conditions. CI jobs runs inside uncontrolled layers of abstraction (docker inside VM, inside VM). To control the environment and keep it automated, we would need to dedicate a VPS instance. This is an expense, potential security vulnerability and needs administration.
On every release
Same as above, only use git-tags to tell if this commit has a new release.
Pros:
Fewer runs - cheaper on HW, doesn't slow builds down. Releases are usually semantically important points in history, where we are interested in perf.
Cons/costs:
Still needs dedicated HW to run consistently. Needs push access to tantivy-benchmark repo.
Presentation
Showing data from every commit might be unnecessarily overwhelming. The current benchmark front-end is clean (imho) and makes it easy to compare results across queries and versions.
On the front-end, we can show 0.6, 0.7, 0.8, 0.9 and latest commit or release.
Power-users or admins can be given the choice to massively extend the table to every commit.
Implementation
A VPS that watches the tantivy main repo, builds a benchmark and commits new results at a decided frequency.
Thoughts?