Closed cyphersnake closed 7 months ago
If we move the poseidon and trivial to bench, we can remove them from example.
Now examples is the only way to run the whole IVC. If we leave only bench, thehe runtime of IVC run will increase 11 times, because benchmark makes 10 runs + 1 test run to measure the execution time.
If we move the poseidon and trivial to bench, we can remove them from example.
Now examples is the only way to run the whole IVC. If we leave only bench, thehe runtime of IVC run will increase 11 times, because benchmark makes 10 runs + 1 test run to measure the execution time.
Since fold_step
is the main function to measure, why not just use the example to run multiple steps instead of use benchmark to run it 10 times? We can add number of steps as optional argument to run the example. This way can avoid repeated code and the codebase cleaner.
Since
fold_step
is the main function to measure, why not just use the example to run multiple steps instead of use benchmark to run it 10 times? We can add number of steps as optional argument to run the example. This way can avoid repeated code and the codebase cleaner.
Well I described in the PR that benchmark is different using cargo-criterion.
It builds average statistics and allows us to track changes in performance automatically instead of manually.
For example, we can delegate this part to CI and track performance growth/decline between Sirius gh-releases.
Duplicating the running of two circuits is a small price to pay to easily track one of the key metrics.
That is, we need 'examples' to:
And benchmark we need:
An alternative is to manually run the example and compare the time on two versions of the code. If you need an example of how to use it, let me know.
Of course there is an alternative, to write the same algorithm for comparison of fold_step performance on the basis of the script that I made earlier for profiling, adding there saving the results to the target folder and comparison with subsequent runs
bench is for performance benchmark and examples are for future developers's reference.
It's okay to run bench 10 times to get the stats, but unnecessary to run example multiple times.
Sounds okay to me be to duplicate some code here.
E.g. 8bcb919
-> 2f47d98
(last 2 month of work)
ivc_of_trivial/fold_1_step
time: [2.5683 s 2.5806 s 2.5924 s]
change: [-97.414% -97.402% -97.388%] (p = 0.00 < 0.10)
Performance has improved.
ivc_of_trivial/fold_2_step
time: [3.9372 s 3.9684 s 4.0019 s]
change: [-97.977% -97.959% -97.943%] (p = 0.00 < 0.10)
Performance has improved.
I would like to benchmark the following:
Suppose poseidon hash function is given by y=H(x)
. In our example the step circuit is simply y=H(x)
.
As for the poseidon benchmark, we can test the following. Given a step circuit size n
, we benchmark the step circuit of the form y=H^n(x)=H(H(...(H(x))
(i.e. hash n
times). We want to see the difference of step circuit size will affect the IVC folding time. Especially, when the step circuit size is similar to ivc circuit size and when the step circuit size is larger than ivc folding circuit size, how it will affect the folding time.
I would like to benchmark the following:
Suppose poseidon hash function is given by
y=H(x)
. In our example the step circuit is simplyy=H(x)
. As for the poseidon benchmark, we can test the following. Given a step circuit sizen
, we benchmark the step circuit of the formy=H^n(x)=H(H(...(H(x))
(i.e. hashn
times). We want to see the difference of step circuit size will affect the IVC folding time. Especially, when the step circuit size is similar to ivc circuit size and when the step circuit size is larger than ivc folding circuit size, how it will affect the folding time.
Well that's hypothesis testing, it shouldn't be in the code permanently, however, it's easily tested with both benchmark and example+profiling
Let me create an issue to track the case you are interested in, test it in a separate thread and give the result to the issue.
I.e. it has nothing to do with this PR specifically, it's about automating performance growth measurements and adding benchmarking infrastructure. But it can already be used to test the hypothesis you cited earlier.
Motivation I'd like to have an easy way to check how a particular code change affects performance, as well as have reports on averages between runs.
Such a solution would allow us to track the impact of certain features quite easily.
Close #33 & MPV milestone
Overview I just moved the example inside the benches and added a 'criterion' crate (actually the default solution in the Rust ecosystem). The only downside is that due to the huge bench
IVC::fold
we can only run 10 samples, but the spread between runs is very small, so it's not critical.Let me know if you need more info on cargo-criterion and how to use it, but I was planning to add it to the README under #247 anyway
This command allows you to create a full-fledged performance report, but most importantly, if you run the target code first on the original version and then on the new one - it will calculate the performance difference taking into account the average values.
In the future, this crate can also be used to measure the performance of bottle necks