Update benchmarking section

baentsch commented 2 months ago

The data at https://openquantumsafe.org/benchmarking/ is by now pretty dated (basically all at the level right before OQS was folded into PQCA) and does not contain information about current & standardized algorithms.

This issue is to suggest adding a suitable caveat on the website to not disappoint users trying to find current figures there (e.g., of NIST standardized or on-ramp algorithms).

Alternatively, the profiling sub project and the corresponding runs to generate data could be re-activated (with a definite timeline) given community interest (how to gauge? how to find doers? Maybe add wording asking for such folks on the website?).

Tagging @dstebila @SWilson4 as to your thinking/preference. Please note that many issues have been closed in the profiling sub project assuming a new benchmarking sub project would take its place. Those probably ought to be re-activated and worked on instead, then.

dstebila commented 2 months ago

Let's put this on the agenda for an upcoming OQS status meeting -- asking for help around reactivating profiling and/or re-vitalizing.

Otherwise yes I agree with having appropriate caveats around the version that is currently there on the website, and am happy to have a PR with appropriate text.

With regards to the standardized algorithms, to some extent benchmarking is not quite as important, as that question has been somewhat settled: there's lots of evidence out there now that ML-KEM and ML-DSA are mostly fine from a performance perspective. Although we will want to track and demonstrate the performance of our implementation of these standards.

As we get back into adding more algorithms from the signature on-ramp, then there will be more research relevance to showing performance characteristics of candidates.

baentsch commented 2 months ago

asking for help around reactivating profiling and/or re-vitalizing.

We can try -- but the "re-vitalization" of oqs-demos also didn't yield concrete results. You can shame, trick or force someone to say Yes to (do) something in a meeting, but I think this will not be successful: People must be motivated to do something (either intrinsically as they believe in the benefit of it/want to support the team or because their company has a problem that needs fixing) or there's a million ways to not act. As you say

With regards to the standardized algorithms, to some extent benchmarking is not quite as important, as that question has been somewhat settled

so what about changing the statement regarding profiling to exactly state that, e.g., "Suspended until NIST competition or other projects need benchmarking data again. Arguments and Contributors for this welcome."

Remember: I developed all of this on your and @christianpaquin 's request to support your NCCoE project. Without such "itch" what's the reason to work on this?

christianpaquin commented 2 months ago

Remember: I developed all of this on your and @christianpaquin 's request to support your NCCoE project.

FYI, we're planning a new round of perf testing later this fall once the FIPS versions have been integrated in the various TLS components. If this server is still available and running, we'll happily surface its results.

dstebila commented 2 months ago

Remember: I developed all of this on your and @christianpaquin 's request to support your NCCoE project.

FYI, we're planning a new round of perf testing later this fall once the FIPS versions have been integrated in the various TLS components. If this server is still available and running, we'll happily surface its results.

Christian, do you think there are any members of the NCCoE relying on this who would be willing to spend some time helping improve our profiling system?

christianpaquin commented 2 months ago

Christian, do you think there are any members of the NCCoE relying on this who would be willing to spend some time helping improve our profiling system?

Maybe. I'll bring it up with them

baentsch commented 2 months ago

If this server is still available and running, we'll happily surface its results.

It sounds like you have sufficient other results to "surface", so what'd be the (incremental) benefits of bringing this section "up to scratch" for you/NCCoE, @christianpaquin ?

there's lots of evidence out there now that ML-KEM and ML-DSA are mostly fine from a performance perspective.

There's no doubt about that (reason for the existence of the benchmarking section gone) @dstebila, but this statement

Although we will want to track and demonstrate the performance of our implementation of these standards.

leads to the key question: Does OQS want to have and demonstrate a good performance (why otherwise measure it?). If it wants this (?), it should make this a priority, maybe asking someone to be its "performance czar"? Better of course would be a "product manager" looking after all of OQS(' unique selling propositions, with performance as one element). Reminder: Getting this was a key reason why the project originally agreed to accept the drawbacks of the LF take-over. Why did this not happen? Maybe a(nother) question for TSC, TAC, GB?

This question also has a reading on https://github.com/open-quantum-safe/liboqs/issues/1426 and performance-improving proposals such as https://github.com/pq-crystals/kyber/pull/85 (in this case, rejected by the upstream).

Resultant technical question here: Does OQS want its (performance) charateristics be controlled by the upstreams or make its own decisions in this regard, e.g., inviting @yoswuk to contribute (e.g., via patch) here?

More basic question: Is anyone looking after moving the "strategic" aspects of these questions to resolution? As per https://github.com/PQCA/TAC/issues/40#issuecomment-2315647010, asking @KennyPaul for input.

christianpaquin commented 2 months ago

It sounds like you have sufficient other results to "surface", so what'd be the (incremental) benefits of bringing this section "up to scratch" for you/NCCoE, @christianpaquin ?

We aimed to provide performance metrics for all tested implementations, including OQS. The benefit is for all consumers of the NIST/NCCoE report to get this info about OQS.

KennyPaul commented 2 months ago

@baentsch Also updated https://github.com/PQCA/TAC/issues/40 in a somewhat similar fashion. If by "looking after" you mean "who is driving", that responsibility belongs to the appropriate governing body, which would either be the TAC or the OQS TSC as appropriate. -kenny

baentsch commented 2 months ago

We aimed to provide performance metrics for all tested implementations, including OQS. The benefit is for all consumers of the NIST/NCCoE report to get this info about OQS.

Understood. Then it is mostly in the interest of OQS to provide this information (and to have comparatively good results), not so much that of NCCoE.

open-quantum-safe / www

Update benchmarking section #216