lurk-lab / lurk-rs

Lurk is a Turing-complete programming language for recursive zk-SNARKs. It is a statically scoped dialect of Lisp, influenced by Scheme and Common Lisp.
https://lurk-lang.org/
Apache License 2.0
420 stars 54 forks source link

Benchmarks (umbrella) #283

Open porcuquine opened 1 year ago

porcuquine commented 1 year ago

Benchmark Organization in 5 Points

  1. Need for Diverse Benchmarking Infrastructure: We need a comprehensive benchmarking infrastructure that measures various aspects across different benchmarks, with an emphasis on developing shared vocabulary and terminology.

    • Next Steps: Develop and implement a diverse set of benchmarks and agree on common terminology for discussion.
  2. Static Analysis of Performance: It’s highlighted that certain performance characteristics, like Lurk reduction algorithm efficiency, can be understood 'statically' based on code changes and don’t require testing on powerful machines.

    • Next Steps: Continue static analysis for algorithmic improvements and integrate findings into tests.
  3. Real-world Performance Measurement: There’s a need to measure real-world performance of the proving pipeline, including aspects like proof time, throughput, hardware requirements, and parameter size.

    • Next Steps: Establish benchmarking infrastructure for real-world performance metrics and periodically validate these metrics.
  4. Non-regression Testing: The importance of non-regression testing is emphasized to prevent unexpected performance issues due to changes in code or dependencies.

    • Next Steps: Implement non-regression tests to block merging if negative performance impact is detected. #790
  5. FoldingConfig and NIVC Considerations: The introduction of NIVC necessitates considering performance variations and benchmarking different FoldingConfigs.

    • Next Steps: Experimentally determine the optimal FoldingConfig and develop benchmarks to measure their performance.

Older Brainstorm : What to benchmark?

We need some good benchmarks exposing performance of many parts of the system so we can:

It would be good to have a general end-to-end benchmark that starts with Lurk source code (text), and ends with a successfully verified proof (first milestone: #378). We should then also have a corpus of 'interesting' input programs. This issue is about what the standard benchmark we run on each such program should look like:

We should make sure to measure each phase that happens with every proof:

We should also measure the parts that can be factored out:

We should measure a range of parameters, though not necessarily every combination of all parameters, as the number grows. These should at least include:

Things we should measure for the above:

huitseeker commented 1 year ago

Update: