What is the process for running benchmarks on repos other than apache/arrow?

andygrove commented 2 years ago

I would like to donate compute resources to run benchmarks for apache/arrow-rs and apache/arrow-datafusion but it doesn't seem like this is possible according to the docs and that benchmarks can only be run against apache/arrow?

ElenaHenderson commented 2 years ago

Hello @andygrove,

Glad to hear you are interested in running benchmarks using arrow-benchmarks-ci.

I added the ability to run benchmarks for multiple repos today after I got your comment.

PR https://github.com/ursacomputing/arrow-benchmarks-ci/pull/48 shows how a repo (e.g., https://github.com/ElenaHenderson/benchmarkable-repo) can be added to be benchmarked. This repo contains both code and benchmarks.

You can find results in Conbench now: https://conbench.ursa.dev/

Here are benchmark results for last two commits in the repo compared to each other: https://conbench.ursa.dev/compare/runs/106c24eda7db4776aca487dda93a37ee...7f204c78e687406d98f4514e828cfeb2/

Here is the Buildkite pipeline with benchmark builds: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2

If apache/arrow-rs and apache/arrow-datafusion repos contain benchmarks, you can use benchmarkable-repo as an example. If benchmarks for apache/arrow-rs and apache/arrow-datafusion repos will live in a different repo, you can use apache/arrow (with benchmarks in https://github.com/ursacomputing/benchmarks) as an example.

I will be documenting the process for adding a new repo to be benchmarked next week. Let me know if you need it sooner.

I can also answer questions if you want to proceed before the document is ready.

Have a great weekend!

ElenaHenderson commented 2 years ago

Please let me know if you are interested in the abilities below and I will figure out how to make them work for multiple repos.

Benchmark results can be posted into a Slack channel like this:

Benchmark results can be posted as PR comments after commit is merged into master:

https://github.com/apache/arrow/pull/12275#issuecomment-1029972722

Committers can request Pull Requests to be benchmarked using this comment: Request: https://github.com/apache/arrow/pull/12164#issuecomment-1014212259 Response: https://github.com/apache/arrow/pull/12164#issuecomment-1014212304

dianaclarke commented 2 years ago

@andygrove I no longer work on Arrow benchmarks, but you might be able to make some use of this initial arrow-datafusion & arrow-rust benchmarking spike:

https://github.com/ursacomputing/benchmarks/pull/79/files

IIUC, you would need to do something similar, but in the arrow-datafusion & arrow-rust repos rather than the ursacomputing/benchmarks repo.

andygrove commented 2 years ago

Thank you @ElenaHenderson and @dianaclarke for the responses. I am putting time aside to work on this over the coming week and will let you know if I have more questions.

dianaclarke commented 2 years ago

@andygrove Here are 2 proof of concept pull requests to get you started.

andygrove commented 2 years ago

Hi @ElenaHenderson. Both the arrow-rs and arrow-datafusion repos now have conbench benchmarks checked in.

Could you point me to the relevant documentation for the next step of adding these to a build pipeline?

ElenaHenderson commented 2 years ago

@andygrove Working on the docs now. Sorry for not getting it done sooner.

ElenaHenderson commented 2 years ago

Hello @andygrove ,

The doc for adding new benchmarkable repo: https://github.com/ursacomputing/arrow-benchmarks-ci/blob/main/docs/how-to-add-new-benchmarkable-repo.md

The doc for adding new benchmark machine (once repo is added): https://github.com/ursacomputing/arrow-benchmarks-ci/blob/main/docs/how-to-add-new-benchmark-machine.md

Note that I tested adding arrow-rs repo this morning and ran its benchmarks on one of machines (Ubuntu 20.04) where apache-arrow benchmarks are run and everything worked. Here are results of arrow-rs benchmarks on conbench:

https://conbench.ursa.dev/runs/acb47dec7d3b460da79d55da1ae9db19/

Ping me if you need anything.

Note that I removed all the code I added to test adding arrow-rs repo.

dianaclarke commented 2 years ago

Nice, thanks @ElenaHenderson!!!

@andygrove I think I've done the next step in this PR: https://github.com/ursacomputing/arrow-benchmarks-ci/pull/57

Which I think means this final step for you: https://github.com/ursacomputing/arrow-benchmarks-ci/blob/main/docs/how-to-add-new-benchmark-machine.md

ElenaHenderson commented 2 years ago

https://github.com/ursacomputing/arrow-benchmarks-ci/pull/57 is merged. @dianaclarke Thank you!

ElenaHenderson commented 2 years ago

I am closing this issue as done since arrow-benchmarks-ci supports adding other repos to be benchmarked:

See doc: https://github.com/ursacomputing/arrow-benchmarks-ci/blob/main/docs/how-to-add-new-benchmarkable-repo.md

voltrondata-labs / arrow-benchmarks-ci

What is the process for running benchmarks on repos other than apache/arrow? #45