starkware-libs / stone-prover

Apache License 2.0
260 stars 76 forks source link

OOM Optimisation Configs #6

Closed maxharrison00 closed 1 year ago

maxharrison00 commented 1 year ago

I have written an e-voting protocol in Cairo0 and I'm trying to collect performance data using the STONE prover for various sizes of inputs (on my personal machine with only 16GB of RAM). I can successfully run the prover on inputs of length 2^6 - this generates a trace of 2^22 rows and 27 columns for a total of 113,246,208 cells. Looking at the log files (using the default cpu_air_prover_config.json) shows that the prover uses a peak of 8528mb of memory (I'm assuming RM: 8528mb, AM: 8721mb indicates that 8721mb was allocated and 8528mb was actually used, please correct me if wrong).

However, I can't get larger lengths of inputs (>= 2^7) to complete without the process being terminated by the Linux OOM killer. I have tried to run the protocol with inputs of length 2^7 with various sets of configurations for the cpu_air_prover_config.json with increasingly aggressive OOM configurations - one of the most aggressive configurations I tried look like this:

{
    "cached_lde_config": {
        "store_full_lde": false,
        "use_fft_for_eval": false
    },
    "constraint_polynomial_task_size": 256,
    "n_out_of_memory_merkle_layers": 16,
    "table_prover_n_tasks_per_segment": 32
} 

The logs indicate that the prover is successfully generating the trace and begins the process of committing to it before using too much memory during LDE. Using the default prover config, the logs show a peak memory usage of 13151mb for a couple rounds of LDE before the process is killed. Using the aggressive prover config above actually results in the prover's process being killed even earlier for using too much memory. As the process doesn't finish the log doesn't show the size of the generated trace, but I would expect the trace to roughly double in length as compared to the shorter input length above.

This leads me to the following questions:

  1. Does this behaviour make sense? I can't see why increasing n_out_of_memory_merkle_layers doesn't decrease the memory usage to at least an extent? Is the above "agressive" configuration what is actually recommended to try and optimise memory usage?
  2. Is there any way out of this other than just throwing more RAM at the problem? I'm not sure how the Cairo0 code could be optimised any further. Would you suggest the problem is likely with the Cairo0 code, or is there something else that might be going wrong?

As I understand it, I should just be able to increase n_out_of_memory_merkle_layers by some constant for every sequential increase in the log of the length of the input in order to decrease the amount of memory used by the prover. An increase of 1 in n_out_of_memory_merkle_layers should halve the amount of memory used during LDE by the prover, so I'm not sure why this wouldn't work (given that the smaller lengths 2^1 - 2^6 all work fine).

gkaempfer commented 1 year ago

STONE always needs to keep the size of two trace cosets in memory regardless of the OOM settings: One representing the coefficients of the trace polynomial and the other during the generation of each coset. In addition there are other memory requirements, e.g. during FRI. Each field element in the trace requires 32 bytes (256 bits) so your 2^7 instance would consume 22^2327*32 = 14,495,514,624 bytes. That probably doesn't leave enough for your Linux OS to run in, not to mention a few additional memory overheads. Unfortunately, you would need more RAM unless you can optimize your Cairo0 further.

On Tue, Oct 3, 2023, 03:30 maxharrison00 @.***> wrote:

I have written an e-voting protocol in Cairo0 and I'm trying to collect performance data using the STONE prover for various sizes of inputs (on my personal machine with only 16GB of RAM). I can successfully run the prover on inputs of length 2^6 - this generates a trace of 2^22 rows and 27 columns for a total of 113,246,208 cells. Looking at the log files (using the default cpu_air_prover_config.json) shows that the prover uses a peak of 8528mb of memory (I'm assuming RM: 8528mb, AM: 8721mb indicates that 8721mb was allocated and 8528mb was actually used, please correct me if wrong).

However, I can't get larger lengths of inputs (>= 2^7) to complete without the process being terminated by the Linux OOM killer. I have tried to run the protocol with inputs of length 2^7 with various sets of configurations for the cpu_air_prover_config.json with increasingly aggressive OOM configurations - one of the most aggressive configurations I tried look like this:

{ "cached_lde_config": { "store_full_lde": false, "use_fft_for_eval": false }, "constraint_polynomial_task_size": 128, "n_out_of_memory_merkle_layers": 16, "table_prover_n_tasks_per_segment": 16 }

The logs indicate that the prover is successfully generating the trace and begins the process of committing to it before using too much memory during LDE. Using the default prover config, the logs show a peak memory usage of 13151mb for a couple rounds of LDE before the process is killed. Using the aggressive prover config above actually results in the prover's process being killed even earlier for using too much memory. As the process doesn't finish the log doesn't show the size of the generated trace, but I would expect the trace to roughly double in length as compared to the shorter input length above.

This leads me to the following questions:

  1. Does this behaviour make sense? I can't see why increasing n_out_of_memory_merkle_layers doesn't decrease the memory usage to at least an extent? Is the above "agressive" configuration what is actually recommended to try and optimise memory usage?
  2. Is there any way out of this other than just throwing more RAM at the problem? I'm not sure how the Cairo0 code could be optimised any further. Would you suggest the problem is likely with the Cairo0 code, or is there something else that might be going wrong?

As I understand it, I should just be able to increase n_out_of_memory_merkle_layers by some constant for every sequential increase in the log of the length of the input in order to decrease the amount of memory used by the prover. An increase of 1 in n_out_of_memory_merkle_layers should halve the amount of memory used during LDE by the prover, so I'm not sure why this wouldn't work (given that the smaller lengths 2^1 - 2^6 all work fine).

— Reply to this email directly, view it on GitHub https://github.com/starkware-libs/stone-prover/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJKWRLX7LYJODWLQ2D5YVNTX5NMDJAVCNFSM6AAAAAA5QFYTJKVHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZDGMBVHA3TQOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>