Automate benchmarking and add to CI

hanno-becker commented 6 months ago

Depends on: #28

Acceptance criteria:

Automated benchmarking of MLKEM-C-AArch64 implementations on the listed platforms (see README) is available and can be run by maintainers and by CI.
Automated benchmarking is added to CI.

Steps:

[x] Plan access to various benchmarking platforms. For some, the maintainers own boards with the right CPU, but we need to figure how we want to access them remotely from CI. For others, such as Graviton instances, we likely need to setup EC2 accounts.
[x] Provide scripts for running benchmarks and preparing results
[x] Add benchmarking to CI

planetf1 commented 6 months ago

A few related items

Github does have ARM based runners. Currently in private beta, expected to go public in July. (As an enterprise (PQCA) we can have access). There are currently costs involved, and a request is open with PQCA TAC.
- still not appropriate for benchmarking, and I'm not sure of what the actual processor support is
- if funding is needed for other CI support, we can go through the TAC, but would need to get some estimates.
- not sure how applicable above is, given this is a very hw specific area

hanno-becker commented 5 months ago

@planetf1 Leaving aside the benchmarking platforms for which the maintainers have development boards, we would need budget for AWS EC2 instances (Graviton2, Graviton3, Apple M1) for benchmarking and CI. Is there precedent for how to get funding for this in PQCA/LF?

planetf1 commented 5 months ago

The process in general would be to raise the issue at the TSC (this becomes more relevant when there are more projects so we can consolidate & ensure there's awareness, but given the active projects are working together less of an issue).

Then we need to raise it with the PQCA TAC for general review, and they then either approve (if within existing budget) or raise a request with the governing board (if not).

Sounds like administrivia, but apart from the last step I think we can do quickly. Also it's my broad understanding, we're just getting started. I do join all the PQCA meetings

Do you have an estimate on resource usage for the AWS EC2 instances ? Any thoughts on how this may grow over time (so we plan for say a year) ? Are there alternatives to EC2 (we may be asked)? When do you need it (yesterday?) What's the impact if you don't have it?

There's a PQCA TAC on Wed, so if we can get info together for that I can add an agenda item to the discussion

hanno-becker commented 5 months ago

Do you have an estimate on resource usage for the AWS EC2 instances ?

Some rough thoughts:

We'd want to benchmark branches and PR revisions. Every individual benchmark should be fast, I'd imagine a few minutes. Therefore, a single EC2 instance per type should be enough to cover CI demands initially, even with a fair amount of PR activity (which we haven't yet reached).

Following https://aws.amazon.com/ec2/pricing/on-demand/, a Graviton3 instance (c7g/xlarge) is currently $0.1445/hr, Graviton2 (c6g/xlarge) is currently $0.136/hr. If the instances ran 24/7, this would amount to $209/month. The true cost should be much lower, however, since we are unlikely to ever reach a level of activity soon that would keep benchmarking CI busy permanently -- still, the above gives an upper bound for Graviton benchmarking.

There is some flexibility in the choice of instance size (medium, large, xlarge, {2,4,8,16}xlarge) -- the above is for xlarge instances with 4 vCPUs, allowing for fast build times. Instances with 1 vCPU would be cheaper, and while the build would be slower, they would likely be equally suitable for benchmarking since all our code is single-threaded.

This does not yet take into account potential M1 instances.

Any thoughts on how this may grow over time (so we plan for say a year) ?

The demand would grow with the frequency with which we make updates to PRs, so hopefully it would grow over time. However, a single instance per type should remain sufficient to cover our needs for the foreseeable future.

Are there alternatives to EC2 (we may be asked)?

We have other benchmarking targets independent of EC2, but Graviton2/3 are part of EC2.

When do you need it (yesterday?)

When we start optimizing MLKEM-C-AArch64 for performance -- in the coming weeks I'd suppose (the first PR to this effect is #38).

What's the impact if you don't have it?

We have to conduct ad-hoc measurements for our PRs. I would imagine this effectively leads to incomplete benchmarking information per PR, depending on what board/EC2 access the respective maintainer has.

hanno-becker commented 5 months ago

@planetf1 I am not in a position to make promises, but one could also apply for cloud credits from AWS (https://aws.amazon.com/government-education/research-and-technical-computing/cloud-credit-for-research/).

planetf1 commented 5 months ago

@planetf1 I am not in a position to make promises, but one could also apply for cloud credits from AWS (aws.amazon.com/government-education/research-and-technical-computing/cloud-credit-for-research).

Thanks @hanno-becker From a quick scan of that page it seems targeted at individual researchers at academic institutions. Whilst this forms part of the community working on projects in pqca, we also have a foundation (with funding from commercial organizations), as well as contributors working for commercial orgs.

planetf1 commented 5 months ago

Thanks for all the info on EC2 - do you think GitHub arm runners would be an alternative? As good? nearly as good? Not very good? ....

Just asking as I know that topic is already floated, and the Linux Foundation has been working with Github on enterprise access, plus already funds usage on other projects.

Many questions on this - as it's not quite publicly available yet, and I've not seen machine specs - but can find out.

I do think proper integration into CI with the resource behind it is important for an implementation that's focussed on performance - manual just adds scope for errors/inconsistency and much harder to spot regressions which a run after each merge allows even automated checking of performance regression (perhaps, with some bounds, given virtual platforms)

ryjones commented 5 months ago

If Graviton is a requirement, it looks like AWS is the only provider. BuildJet arm builders are something I can do quickly. AWS will take more work. GitHub offers mac arm runners as well.

hanno-becker commented 5 months ago

@ryjones Do we know what hardware underlies the native BuildJet arm builders?

We need to know exactly which hardware we are running on for the benchmarking (this is in contrast to the functional tests, which need an Arm as well, but any platform will do, including an emulated one).

ryjones commented 5 months ago

@hanno-becker it looks like AWS is the only source for graviton. So let's do it bit by bit, get what we can and add AWS later (by later, I mean weeks or a month, not months).

The thing with AWS is I need to get those resources under the MSA that LF has with Amazon, which takes some paper shuffling.

hanno-becker commented 5 months ago

@ryjones Sounds good to me.

planetf1 commented 5 months ago

Thanks @ryjones for looking into this

hanno-becker commented 5 months ago

@ryjones

BuildJet arm builders are something I can do quickly.

I think for testing and dynamic analysis that might still be useful. Can you help us / give us some pointers on how to use BuildJet arm runners?

ryjones commented 5 months ago

@hanno-becker Once approved, I will connect them to the org and you can use them like any other runner. Here are the docs.

ryjones commented 5 months ago

@planetf1 @hanno-becker the runner name is pqcp-arm64. you could convert this job into a matrix and add that as a target

see this as an example

planetf1 commented 5 months ago

@ryjones So pqcp-arm64 is a BuildJet runner? Is there a link/summary of what it is (cpu, processors, ram etc)?

ryjones commented 5 months ago

@planetf1 no, it is a github runner.

Runner group: [pqcp-large-runners](https://github.com/enterprises/post-quantum-cryptography-alliance/settings/actions/runner-groups/3)
Platform: Linux ARM64 Beta
Size:
4-cores · 16 GB RAM · 150 GB SSD
Public IP: Disabled
Network Configuration: Disabled

I can make it larger or smaller as you need.

hanno-becker commented 5 months ago

@ryjones This is great, thank you very much. Leveraged in #49.

hanno-becker commented 2 months ago

EC2 benchmarking added in #99

pq-code-package / mlkem-native

Automate benchmarking and add to CI #34