Open Quba1 opened 3 weeks ago
Sure would love the help :)
Yup, seeing this in actions when running training
=== PANIC ===
A fatal error happened, you can check the experiment logs here => '/tmp/guide/experiment.log'
=============
thread 'main' panicked at /Users/ragy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cubecl-runtime-0.3.0/src/memory_management/memory_manage.rs:283:35:
index out of bounds: the len is 6 but the index is 6
cubecl_runtime::memory_management::memory_manage::MemoryManagement<Storage>::reserve()
(below) can panic whensize
is bigger than any availablepool
.https://github.com/tracel-ai/cubecl/blob/3882ed25b47506d49562c501a179b7468e61702e/crates/cubecl-runtime/src/memory_management/memory_manage.rs#L275-L288
In my case (running [4] 2_062_401_536 B. So
burn
) my program attempted to allocate 2_925_527_040 bytes with pools max sizes: [0] 4_028_128 B [1] 4_028_128 B [2] 32_225_024 [3]pool_ind
was computed as5
and&mut self.pools[pool_ind]
panicked.The function can panic in edge-cases because per docs
partition_point()
requires that:and the condition seems to not be checked at any point.
The same issue also occurs a few lines lower in
alloc()
.The solution is probably a simple bounds check or capping
pool_ind
topools.len() - 1
or some more complex error handling.Happy to help with solving the problem.