ingonyama-zk / icicle

a GPU Library for Zero-Knowledge Acceleration
MIT License
303 stars 88 forks source link

[BUG]: CUDA out of memory error #527

Closed zkbitcoin closed 2 months ago

zkbitcoin commented 2 months ago

Description

Please provide a clear and concise description of the bug.

  1. cd to the directory

/git/icicle/wrappers/rust/icicle-curves/icicle-bls12-377

  1. cargo build ; cargo bench

  2. out of memory message:

Benchmarking bls12_377 MSM / 65536 x 16 with precomp = 1: Collecting 10 samples in estimated 12.647 s (10 ite bls12_377 MSM / 65536 x 16 with precomp = 1
time: [1.4704 s 1.4706 s 1.4708 s] change: [-0.0163% +0.0110% +0.0335%] (p = 0.43 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low mild Benchmarking bls12_377 MSM / 65536 x 128 with precomp = 1: Warming up for 3.0000 sCUDA Runtime Error by: cudaMallocAsync(&sort_indices_temp_storage, sort_indices_temp_storage_bytes, stream) at: /home/amilkowski/git/icicle/icicle/src/msm/msm.cu:450 out of memory

CUDA Runtime Error by: bucket_method_msm( bitsize, c, scalars, points, config.batch_size, msm_size, (config.points_size == 0) ? msm_size : config.points_size, results, config.are_scalars_on_device, config.are_scalars_montgomery_form, config.are_points_on_device, config.are_points_montgomery_form, config.are_results_on_device, config.is_big_triangle, config.large_bucket_factor, config.precompute_factor, config.is_async, stream) at: /home/amilkowski/git/icicle/icicle/src/msm/msm.cu:845 out of memory

thread 'main' panicked at icicle-curves/icicle-bls12-377/benches/msm.rs:5:1: called Result::unwrap() on an Err value: IcicleError { icicle_error_code: InternalCudaError, cuda_error: Some(cudaErrorMemoryAllocation), reason: Some("Runtime CUDA error.") }

Reproduce

using same GPU card and environment repeat steps above

Expected Behavior

software should give one ability to work within memory limit of a card, for instance ability to specify batch_size lower or any other variables either via ie export ICICLE_BATCH_SIZE = 8 etc

Please provide a clear and concise description of what you expected to happen.

Environment

linux

Please complete the following information:

OS + Version:

Ubuntu 22.04.4 LTS

Cargo Version:

1.78.0

GPU type:

[GeForce GTX 1080 Ti]

Additional context

Please provide any additional context that may be helpful in confirming and resolving this issue.

I think just an ability to override batch size (others?) would be helpful just to keep testing /benchmarking going

zkbitcoin commented 2 months ago

by overriding after https://github.com/ingonyama-zk/icicle/blob/main/icicle/src/msm/msm.cu#L837

setting c = 8;

bench was able to run bit further (bit slower but not by huge amounts)

but then it ran out of memory on larger size attempt at:

bls12_377 MSM / 131072 x 16 with precomp = 1

card is not that old, trying now with setting c = 4 but its hacky as is not done by env variable or similar

omershlo commented 2 months ago

I will let others comment more to the point about your idea with ICICLE_BATCH_SIZE . Just wanted to say that card is from 2017, which is old.

zkbitcoin commented 2 months ago

I know.. I feel bad.. technology moves so fast.. thank you so much.. trying that c = 4 like with many other projects, doing it for education research learning.. will let know if 4 succeeds

so far and lol but expected

bls12_377 MSM / 32768 x 1 with precomp = 1
time: [605.63 ms 605.70 ms 605.77 ms] change: [+507.24% +507.82% +508.40%] (p = 0.00 < 0.05) Performance has regressed.

zkbitcoin commented 2 months ago

@omershlo

  1. size 4 fails (@ bls12_377 MSM / 32768 x 1 with precomp = 1)
  2. /git/icicle$ grep -i -r ICICLE_BATCH_SIZE . shows no such in the directory

one could say there is a limit at which only more modern cards should be run with

LeonHibnik commented 2 months ago

@zkbitcoin you can limit the benchmark size by setting MAX_LOG2 and MIN_LOG2 env vars in your runs

zkbitcoin commented 2 months ago

@zkbitcoin you can limit the benchmark size by setting MAX_LOG2 and MIN_LOG2 env vars in your runs

just did it at the very time you made this comment

export MAX_LOG2=15 succeeds, its flexible or was aleady, I can close this ticket if you want "functions as designed"

zkbitcoin commented 2 months ago

functions as designed, use MAX_LOG2 and MIN_LOG2 environmental variables to set limits lower for any given card