snarkify / sirius

A Plonkish folding framework for Incrementally Verifiable Computation (IVC).
MIT License
106 stars 15 forks source link

feat(example): upd examples & add benches #248

Closed cyphersnake closed 1 month ago

cyphersnake commented 1 month ago

Motivation I'd like to have an easy way to check how a particular code change affects performance, as well as have reports on averages between runs.

Such a solution would allow us to track the impact of certain features quite easily.

Close #33 & MPV milestone

Overview I just moved the example inside the benches and added a 'criterion' crate (actually the default solution in the Rust ecosystem). The only downside is that due to the huge bench IVC::fold we can only run 10 samples, but the spread between runs is very small, so it's not critical.

Let me know if you need more info on cargo-criterion and how to use it, but I was planning to add it to the README under #247 anyway

cargo install cargo-criterion; cargo criterion

This command allows you to create a full-fledged performance report, but most importantly, if you run the target code first on the original version and then on the new one - it will calculate the performance difference taking into account the average values.

In the future, this crate can also be used to measure the performance of bottle necks

cyphersnake commented 1 month ago

If we move the poseidon and trivial to bench, we can remove them from example.

Now examples is the only way to run the whole IVC. If we leave only bench, thehe runtime of IVC run will increase 11 times, because benchmark makes 10 runs + 1 test run to measure the execution time.

chaosma commented 1 month ago

If we move the poseidon and trivial to bench, we can remove them from example.

Now examples is the only way to run the whole IVC. If we leave only bench, thehe runtime of IVC run will increase 11 times, because benchmark makes 10 runs + 1 test run to measure the execution time.

Since fold_step is the main function to measure, why not just use the example to run multiple steps instead of use benchmark to run it 10 times? We can add number of steps as optional argument to run the example. This way can avoid repeated code and the codebase cleaner.

cyphersnake commented 1 month ago

Since fold_step is the main function to measure, why not just use the example to run multiple steps instead of use benchmark to run it 10 times? We can add number of steps as optional argument to run the example. This way can avoid repeated code and the codebase cleaner.

Well I described in the PR that benchmark is different using cargo-criterion.

It builds average statistics and allows us to track changes in performance automatically instead of manually.

For example, we can delegate this part to CI and track performance growth/decline between Sirius gh-releases.

Duplicating the running of two circuits is a small price to pay to easily track one of the key metrics.

cyphersnake commented 1 month ago

That is, we need 'examples' to:

And benchmark we need:

An alternative is to manually run the example and compare the time on two versions of the code. If you need an example of how to use it, let me know.

cyphersnake commented 1 month ago

Of course there is an alternative, to write the same algorithm for comparison of fold_step performance on the basis of the script that I made earlier for profiling, adding there saving the results to the target folder and comparison with subsequent runs

drouyang commented 1 month ago

bench is for performance benchmark and examples are for future developers's reference.

It's okay to run bench 10 times to get the stats, but unnecessary to run example multiple times.

Sounds okay to me be to duplicate some code here.

cyphersnake commented 1 month ago

E.g. 8bcb919 -> 2f47d98 (last 2 month of work)

ivc_of_trivial/fold_1_step                                
                        time:   [2.5683 s 2.5806 s 2.5924 s]
                        change: [-97.414% -97.402% -97.388%] (p = 0.00 < 0.10)
                        Performance has improved.
ivc_of_trivial/fold_2_step                                
                        time:   [3.9372 s 3.9684 s 4.0019 s]
                        change: [-97.977% -97.959% -97.943%] (p = 0.00 < 0.10)
                        Performance has improved.
chaosma commented 1 month ago

I would like to benchmark the following:

Suppose poseidon hash function is given by y=H(x). In our example the step circuit is simply y=H(x). As for the poseidon benchmark, we can test the following. Given a step circuit size n, we benchmark the step circuit of the form y=H^n(x)=H(H(...(H(x)) (i.e. hash n times). We want to see the difference of step circuit size will affect the IVC folding time. Especially, when the step circuit size is similar to ivc circuit size and when the step circuit size is larger than ivc folding circuit size, how it will affect the folding time.

cyphersnake commented 1 month ago

I would like to benchmark the following:

Suppose poseidon hash function is given by y=H(x). In our example the step circuit is simply y=H(x). As for the poseidon benchmark, we can test the following. Given a step circuit size n, we benchmark the step circuit of the form y=H^n(x)=H(H(...(H(x)) (i.e. hash n times). We want to see the difference of step circuit size will affect the IVC folding time. Especially, when the step circuit size is similar to ivc circuit size and when the step circuit size is larger than ivc folding circuit size, how it will affect the folding time.

Well that's hypothesis testing, it shouldn't be in the code permanently, however, it's easily tested with both benchmark and example+profiling

Let me create an issue to track the case you are interested in, test it in a separate thread and give the result to the issue.

I.e. it has nothing to do with this PR specifically, it's about automating performance growth measurements and adding benchmarking infrastructure. But it can already be used to test the hypothesis you cited earlier.