snarkify / sirius

A Plonkish folding framework for Incrementally Verifiable Computation (IVC).
MIT License
108 stars 15 forks source link

feat(profile): add timer profile for folding #243

Closed chaosma closed 2 months ago

chaosma commented 2 months ago

Issue Link / Motivation To better understand the performance bottleneck of folding, we need have a timer to profile each step/component.

Changes Overview Added the timer profile. Example looks like:

··Start:   Fold 1 step
····Start:   VanillaFS::prove: secondary_cross_terms
······Start:   commit cross term
········Start:   fold transform
········End:     fold transform ....................................................2.001ms
········Start:   cross term evaluation
········End:     cross term evaluation .............................................11.043s
······End:     commit cross term ...................................................11.147s
······Start:   fold instance-witness pair
······End:     fold instance-witness pair ..........................................3.892ms
····End:     VanillaFS::prove: secondary_cross_terms ...............................11.151s
····Start:   process_step: primary
····End:     process_step: primary .................................................398.542µs
····Start:   process primary instance
····End:     process primary instance ..............................................176.583µs
····Start:   create primary ivc circuit
····End:     create primary ivc circuit ............................................1.250µs
····Start:   generate primary plonk trace
····End:     generate primary plonk trace ..........................................149.452ms
····Start:   VanillaFS::prove: primary_cross_term
······Start:   commit cross term
········Start:   fold transform
········End:     fold transform ....................................................1.967ms
········Start:   cross term evaluation
········End:     cross term evaluation .............................................11.060s
······End:     commit cross term ...................................................11.196s
······Start:   fold instance-witness pair
······End:     fold instance-witness pair ..........................................3.596ms
····End:     VanillaFS::prove: primary_cross_term ..................................11.200s
····Start:   process_step: secondary
····End:     process_step: secondary ...............................................429.292µs
····Start:   process secondary instance
····End:     process secondary instance ............................................165.875µs
····Start:   create secondary ivc circuit
····End:     create secondary ivc circuit ..........................................481.403ms
····Start:   generate secondary plonk trace
····End:     generate secondary plonk trace ........................................148.273ms
··End:     Fold 1 step .............................................................23.222s
drouyang commented 2 months ago

cross term evaluation .............................................11.043s cross term evaluation .............................................11.060s

do they include the MSM time? what is the number of scalars for the commit?

chaosma commented 2 months ago

cross term evaluation .............................................11.043s cross term evaluation .............................................11.060s

do they include the MSM time? what is the number of scalars for the commit?

It doesn't include the MSM time. In this example, it is at most 2^19. After we import the halo2's expression evaluation algorithm, this number should be reduced.

drouyang commented 2 months ago

It doesn't include the MSM time. In this example, it is at most 2^19. After we import the halo2's expression evaluation algorithm, this number should be reduced.

which row accounts for the MSM time?

cyphersnake commented 2 months ago

Discussed by voice, @chaosma approved the idea of logical scopes where tracing::span & timer are handled together. I will make a PR for this branch

cyphersnake commented 2 months ago

The tracing::span out of the box knows how long a particular span has taken, displaying the event in the log.

I see how it works and then it will be necessary to just track these events separately and then we will get time-profiling out of the box even in real environment

cyphersnake commented 2 months ago
2024-05-07T14:22:02.284697Z  INFO creating pp:primary:compressing: sirius::plonk: close time.busy=211µs time.idle=701ns
2024-05-07T14:22:02.284877Z  INFO creating pp:primary:CircuitRunner::collect_plonk:compressing: sirius::plonk: close time.busy=115µs time.idle=220ns
2024-05-07T14:22:03.249545Z  INFO creating pp:primary:CircuitRunner::collect_plonk: sirius::table::circuit_runner: close time.busy=965ms time.idle=882ns
2024-05-07T14:22:03.249562Z  INFO creating pp:primary: sirius::ivc::public_params: close time.busy=965ms time.idle=401ns
2024-05-07T14:22:03.249811Z  INFO creating pp:secondary:compressing: sirius::plonk: close time.busy=173µs time.idle=390ns
2024-05-07T14:22:03.257637Z  INFO creating pp:secondary:CircuitRunner::collect_plonk:compressing: sirius::plonk: close time.busy=227µs time.idle=4.23µs
2024-05-07T14:22:04.197939Z  INFO creating pp:secondary:CircuitRunner::collect_plonk: sirius::table::circuit_runner: close time.busy=941ms time.idle=3.17µs
2024-05-07T14:22:04.585533Z  INFO creating pp:secondary:CircuitRunner::collect_witness: sirius::table::circuit_runner: close time.busy=388ms time.idle=1.60µs
2024-05-07T14:22:05.080399Z  INFO creating pp:secondary: sirius::ivc::public_params: close time.busy=1.83s time.idle=1.15µs
2024-05-07T14:23:06.385542Z  INFO creating pp: sirius::ivc::public_params: close time.busy=64.1s time.idle=2.87µs
2024-05-07T14:23:08.003430Z  INFO primary{step=0}:CircuitRunner::collect_witness: sirius::table::circuit_runner: close time.busy=414ms time.idle=2.15µs
2024-05-07T14:23:08.608376Z  INFO primary{step=0}: sirius::ivc::incrementally_verifiable_computation: close time.busy=2.22s time.idle=1.56µs
2024-05-07T14:23:10.009753Z  INFO secondary{step=0}:CircuitRunner::collect_witness: sirius::table::circuit_runner: close time.busy=361ms time.idle=2.26µs
2024-05-07T14:23:10.518157Z  INFO secondary{step=0}: sirius::ivc::incrementally_verifiable_computation: close time.busy=1.91s time.idle=1.37µs
2024-05-07T14:23:10.520247Z  INFO primary{step=1}:GraphEvaluator::new: sirius::polynomial::graph_evaluator: close time.busy=490µs time.idle=391ns

In general it looks like it's easy enough to get the same functionality as presented in this PR, but with tracing::span (and thus grouped logs) without arc_std. Internally it works the same via std::time.

I'll probably modify the mask of a particular line with time profiling, and also make a proposal that describes the logical blocks (what we will actually profile).

This will be a more general mechanism

cyphersnake commented 2 months ago

I think I'll try to switch to structured logs (to make it convenient to parse data from json), make at least 10 runs with criterion (without caches), and average the percentages by span, how much each scope takes on average in the overall runtime.

drouyang commented 2 months ago

I don't think the logging data is more helpful than the current PR.

The GPU team is familiar with the format of the current PR, hence it's easy for the GPU to optimize the performance.

It's not immediately clear to me why logs are easier to use.

Please postpone your a new PR as for now. The GPU engineers need to be involved.

drouyang commented 2 months ago

The consumer of these output data is the performance team, so the output data needs to fit our needs. For example, understanding how many major phases are there in a folding step, and which phase is the most time consuming one that we should focus on. It's not immediately clear to me with the new logging format.

For Sirius internal benchmarking, it could be something different. But this PR was originally for the performance team.

cyphersnake commented 2 months ago

For now it's grep by "time.busy" for INFO logs This from out server:

2024-05-07T19:12:27.931168Z  INFO poseidon_example:get_or_create_commitment_key{k=27 label="bn256"}: poseidon: close time.busy=3.40s time.idle=2.40µs
2024-05-07T19:12:31.228868Z  INFO poseidon_example:get_or_create_commitment_key{k=27 label="grumpkin"}: poseidon: close time.busy=3.30s time.idle=3.41µs

2024-05-07T19:12:31.229492Z  INFO poseidon_example:pp:primary:compressed_gates: sirius::plonk: close time.busy=441µs time.idle=780ns
2024-05-07T19:12:31.230794Z  INFO poseidon_example:pp:primary:circuit_runner::collect_struct:compressed_gates: sirius::plonk: close time.busy=1.18ms time.idle=890ns
2024-05-07T19:12:33.618531Z  INFO poseidon_example:pp:primary:circuit_runner::collect_struct: sirius::table::circuit_runner: close time.busy=2.39s time.idle=1.69µs
2024-05-07T19:12:33.618576Z  INFO poseidon_example:pp:primary: sirius::ivc::public_params: close time.busy=2.39s time.idle=820ns
2024-05-07T19:12:33.619451Z  INFO poseidon_example:pp:secondary:compressed_gates: sirius::plonk: close time.busy=788µs time.idle=490ns
2024-05-07T19:12:33.643274Z  INFO poseidon_example:pp:secondary:circuit_runner::collect_struct:compressed_gates: sirius::plonk: close time.busy=242µs time.idle=1.00µs
2024-05-07T19:12:35.078337Z  INFO poseidon_example:pp:secondary:circuit_runner::collect_struct: sirius::table::circuit_runner: close time.busy=1.44s time.idle=3.03µs
2024-05-07T19:12:35.692019Z  INFO poseidon_example:pp:secondary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=614ms time.idle=3.01µs
2024-05-07T19:12:36.346118Z  INFO poseidon_example:pp:secondary: sirius::ivc::public_params: close time.busy=2.73s time.idle=2.63µs
2024-05-07T19:14:01.379436Z  INFO poseidon_example:pp: sirius::ivc::public_params: close time.busy=90.2s time.idle=2.77µs

2024-05-07T19:14:02.589518Z  INFO poseidon_example:ivc::new{step=0}:primary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=951ms time.idle=3.09µs
2024-05-07T19:14:03.733178Z  INFO poseidon_example:ivc::new{step=0}:primary: sirius::ivc::incrementally_verifiable_computation: close time.busy=2.35s time.idle=1.37µs
2024-05-07T19:14:04.338602Z  INFO poseidon_example:ivc::new{step=0}:secondary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=605ms time.idle=1.91µs
2024-05-07T19:14:05.043506Z  INFO poseidon_example:ivc::new{step=0}:secondary: sirius::ivc::incrementally_verifiable_computation: close time.busy=1.31s time.idle=2.48µs
2024-05-07T19:14:05.072757Z  INFO poseidon_example:ivc::new{step=0}: sirius::ivc::incrementally_verifiable_computation: close time.busy=3.69s time.idle=2.00µs

2024-05-07T19:14:06.178898Z  INFO poseidon_example:ivc::fold_step{step=1}:primary:relaxed::fold: sirius::plonk: close time.busy=17.5ms time.idle=3.64µs
2024-05-07T19:14:07.348764Z  INFO poseidon_example:ivc::fold_step{step=1}:primary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=976ms time.idle=3.63µs
2024-05-07T19:14:10.107443Z  INFO poseidon_example:ivc::fold_step{step=1}:primary:relaxed::fold: sirius::plonk: close time.busy=27.2ms time.idle=3.68µs
2024-05-07T19:14:10.116225Z  INFO poseidon_example:ivc::fold_step{step=1}:primary: sirius::ivc::incrementally_verifiable_computation: close time.busy=5.04s time.idle=780ns
2024-05-07T19:15:35.864290Z  INFO poseidon_example:ivc::fold_step{step=1}:secondary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=629ms time.idle=3.88µs
2024-05-07T19:15:36.583436Z  INFO poseidon_example:ivc::fold_step{step=1}:secondary: sirius::ivc::incrementally_verifiable_computation: close time.busy=86.5s time.idle=3.10µs
2024-05-07T19:15:36.614179Z  INFO poseidon_example:ivc::fold_step{step=1}: sirius::ivc::incrementally_verifiable_computation: close time.busy=91.5s time.idle=2.17µs

2024-05-07T19:15:37.723293Z  INFO poseidon_example:ivc::fold_step{step=2}:primary:relaxed::fold: sirius::plonk: close time.busy=17.4ms time.idle=4.01µs
2024-05-07T19:15:38.893790Z  INFO poseidon_example:ivc::fold_step{step=2}:primary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=972ms time.idle=3.11µs
2024-05-07T19:15:41.645868Z  INFO poseidon_example:ivc::fold_step{step=2}:primary:relaxed::fold: sirius::plonk: close time.busy=26.2ms time.idle=3.79µs
2024-05-07T19:15:41.655023Z  INFO poseidon_example:ivc::fold_step{step=2}:primary: sirius::ivc::incrementally_verifiable_computation: close time.busy=5.04s time.idle=1.11µs
2024-05-07T19:17:07.453499Z  INFO poseidon_example:ivc::fold_step{step=2}:secondary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=631ms time.idle=3.50µs
2024-05-07T19:17:08.174948Z  INFO poseidon_example:ivc::fold_step{step=2}:secondary: sirius::ivc::incrementally_verifiable_computation: close time.busy=86.5s time.idle=2.54µs
2024-05-07T19:17:08.205490Z  INFO poseidon_example:ivc::fold_step{step=2}: sirius::ivc::incrementally_verifiable_computation: close time.busy=91.6s time.idle=2.90µs

2024-05-07T19:17:09.323211Z  INFO poseidon_example:ivc::fold_step{step=3}:primary:relaxed::fold: sirius::plonk: close time.busy=18.3ms time.idle=3.89µs
2024-05-07T19:17:10.488656Z  INFO poseidon_example:ivc::fold_step{step=3}:primary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=969ms time.idle=3.69µs
2024-05-07T19:17:13.249343Z  INFO poseidon_example:ivc::fold_step{step=3}:primary:relaxed::fold: sirius::plonk: close time.busy=25.0ms time.idle=3.81µs
2024-05-07T19:17:13.258710Z  INFO poseidon_example:ivc::fold_step{step=3}:primary: sirius::ivc::incrementally_verifiable_computation: close time.busy=5.05s time.idle=850ns
2024-05-07T19:18:38.991040Z  INFO poseidon_example:ivc::fold_step{step=3}:secondary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=632ms time.idle=3.47µs
2024-05-07T19:18:39.710308Z  INFO poseidon_example:ivc::fold_step{step=3}:secondary: sirius::ivc::incrementally_verifiable_computation: close time.busy=86.5s time.idle=2.37µs
2024-05-07T19:18:39.741408Z  INFO poseidon_example:ivc::fold_step{step=3}: sirius::ivc::incrementally_verifiable_computation: close time.busy=91.5s time.idle=2.45µs

2024-05-07T19:18:40.869731Z  INFO poseidon_example:ivc::fold_step{step=4}:primary:relaxed::fold: sirius::plonk: close time.busy=21.3ms time.idle=4.69µs
2024-05-07T19:18:42.040552Z  INFO poseidon_example:ivc::fold_step{step=4}:primary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=973ms time.idle=3.75µs
2024-05-07T19:18:44.799611Z  INFO poseidon_example:ivc::fold_step{step=4}:primary:relaxed::fold: sirius::plonk: close time.busy=25.6ms time.idle=3.57µs
2024-05-07T19:18:44.808964Z  INFO poseidon_example:ivc::fold_step{step=4}:primary: sirius::ivc::incrementally_verifiable_computation: close time.busy=5.07s time.idle=1.03µs
2024-05-07T19:20:10.508101Z  INFO poseidon_example:ivc::fold_step{step=4}:secondary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=632ms time.idle=3.55µs
2024-05-07T19:20:11.227849Z  INFO poseidon_example:ivc::fold_step{step=4}:secondary: sirius::ivc::incrementally_verifiable_computation: close time.busy=86.4s time.idle=2.57µs
2024-05-07T19:20:11.259262Z  INFO poseidon_example:ivc::fold_step{step=4}: sirius::ivc::incrementally_verifiable_computation: close time.busy=91.5s time.idle=2.56µs

2024-05-07T19:20:12.388213Z  INFO poseidon_example:ivc::fold_step{step=5}:primary:relaxed::fold: sirius::plonk: close time.busy=18.2ms time.idle=4.10µs
2024-05-07T19:20:13.565859Z  INFO poseidon_example:ivc::fold_step{step=5}:primary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=976ms time.idle=3.27µs
2024-05-07T19:20:16.320667Z  INFO poseidon_example:ivc::fold_step{step=5}:primary:relaxed::fold: sirius::plonk: close time.busy=26.5ms time.idle=3.09µs
2024-05-07T19:20:16.330006Z  INFO poseidon_example:ivc::fold_step{step=5}:primary: sirius::ivc::incrementally_verifiable_computation: close time.busy=5.07s time.idle=780ns
2024-05-07T19:21:42.067585Z  INFO poseidon_example:ivc::fold_step{step=5}:secondary:circuit_runner::collect_witness: sirius::table::circuit_runner: close time.busy=632ms time.idle=3.89µs
2024-05-07T19:21:42.784959Z  INFO poseidon_example:ivc::fold_step{step=5}:secondary: sirius::ivc::incrementally_verifiable_computation: close time.busy=86.5s time.idle=2.24µs
2024-05-07T19:21:42.816755Z  INFO poseidon_example:ivc::fold_step{step=5}: sirius::ivc::incrementally_verifiable_computation: close time.busy=91.6s time.idle=2.84µs

2024-05-07T19:21:46.582339Z  INFO poseidon_example:ivc::vefify: sirius::ivc::incrementally_verifiable_computation: close time.busy=3.77s time.idle=2.35µs

2024-05-07T19:21:47.388148Z  INFO poseidon_example: poseidon: close time.busy=563s time.idle=11.6µs

I continue to break down pieces of code into logical scope so that we have a detailed breakdown. Then I will build percentages of which modules take how long to complete

cyphersnake commented 2 months ago

Make alternative in #246