Closed andygrove closed 2 years ago
Hello @andygrove,
Glad to hear you are interested in running benchmarks using arrow-benchmarks-ci.
I added the ability to run benchmarks for multiple repos today after I got your comment.
PR https://github.com/ursacomputing/arrow-benchmarks-ci/pull/48 shows how a repo (e.g., https://github.com/ElenaHenderson/benchmarkable-repo) can be added to be benchmarked. This repo contains both code and benchmarks.
You can find results in Conbench now: https://conbench.ursa.dev/
Here are benchmark results for last two commits in the repo compared to each other: https://conbench.ursa.dev/compare/runs/106c24eda7db4776aca487dda93a37ee...7f204c78e687406d98f4514e828cfeb2/
Here is the Buildkite pipeline with benchmark builds: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2
If apache/arrow-rs
and apache/arrow-datafusion
repos contain benchmarks, you can use benchmarkable-repo
as an example.
If benchmarks for apache/arrow-rs
and apache/arrow-datafusion
repos will live in a different repo, you can use apache/arrow
(with benchmarks in https://github.com/ursacomputing/benchmarks) as an example.
I will be documenting the process for adding a new repo to be benchmarked next week. Let me know if you need it sooner.
I can also answer questions if you want to proceed before the document is ready.
Have a great weekend!
Please let me know if you are interested in the abilities below and I will figure out how to make them work for multiple repos.
https://github.com/apache/arrow/pull/12275#issuecomment-1029972722
@andygrove I no longer work on Arrow benchmarks, but you might be able to make some use of this initial arrow-datafusion & arrow-rust benchmarking spike:
https://github.com/ursacomputing/benchmarks/pull/79/files
IIUC, you would need to do something similar, but in the arrow-datafusion & arrow-rust repos rather than the ursacomputing/benchmarks repo.
Thank you @ElenaHenderson and @dianaclarke for the responses. I am putting time aside to work on this over the coming week and will let you know if I have more questions.
@andygrove Here are 2 proof of concept pull requests to get you started.
Hi @ElenaHenderson. Both the arrow-rs and arrow-datafusion repos now have conbench benchmarks checked in.
Could you point me to the relevant documentation for the next step of adding these to a build pipeline?
@andygrove Working on the docs now. Sorry for not getting it done sooner.
Hello @andygrove ,
The doc for adding new benchmarkable repo: https://github.com/ursacomputing/arrow-benchmarks-ci/blob/main/docs/how-to-add-new-benchmarkable-repo.md
The doc for adding new benchmark machine (once repo is added): https://github.com/ursacomputing/arrow-benchmarks-ci/blob/main/docs/how-to-add-new-benchmark-machine.md
Note that I tested adding arrow-rs
repo this morning and ran its benchmarks on one of machines (Ubuntu 20.04) where apache-arrow benchmarks are run and everything worked. Here are results of arrow-rs benchmarks on conbench:
https://conbench.ursa.dev/runs/acb47dec7d3b460da79d55da1ae9db19/
Ping me if you need anything.
Note that I removed all the code I added to test adding arrow-rs
repo.
Nice, thanks @ElenaHenderson!!!
@andygrove I think I've done the next step in this PR: https://github.com/ursacomputing/arrow-benchmarks-ci/pull/57
Which I think means this final step for you: https://github.com/ursacomputing/arrow-benchmarks-ci/blob/main/docs/how-to-add-new-benchmark-machine.md
https://github.com/ursacomputing/arrow-benchmarks-ci/pull/57 is merged. @dianaclarke Thank you!
I am closing this issue as done since arrow-benchmarks-ci supports adding other repos to be benchmarked:
I would like to donate compute resources to run benchmarks for
apache/arrow-rs
andapache/arrow-datafusion
but it doesn't seem like this is possible according to the docs and that benchmarks can only be run againstapache/arrow
?