Open vadimkantorov opened 3 years ago
Would it work for GPU? Does this allocator name carry any semantics? I could not found an example for "gpu" or "cuda".
What about memory allocation tracing/logging? Is there such a capability?
Thanks!
Yes, the cuda allocators do use the BFCArena to allocate a large chunk. See https://github.com/microsoft/onnxruntime/blob/cb8d8464bc6ee39a894ecc76e05574095f5eb489/onnxruntime/core/providers/cuda/cuda_execution_provider.cc#L75-L83 Have you tried using nvprof for debugging?
nvprof
is a good idea. thanks!
What is current behaviour of standard CUDA execution provider InferenceSession.run
wrt allocations? Will it cache them between the runs? What are conditions of eviction? Are there any tweaks of the CUDA allocator? (e.g. to configure chunk size). My setup is varying-time batches for seq2seq (of course, we'll be working around by padding up to fixed-time batch shapes, but having proper allocator controls for inference would be good)
I followed the reference code about gpu meomory allocator, which is below :
expected_kvp_allocator = {
"max_mem": 16,
"arena_extend_strategy": 1,
"initial_chunk_size_bytes": 10,
"max_dead_bytes_per_chunk": 4,
"initial_growth_chunk_size_bytes": 2,
}
# ort_arena_cfg_kvp = onnxruntime.OrtArenaCfg(8,0,4,2)
ort_arena_cfg_kvp = onnxruntime.OrtArenaCfg(expected_kvp_allocator)
However, It seems not work to the code running.
How to configure pre-allocation of GPU memory with Python bindings?
I found
initial_chunk_size_bytes
detailed at https://www.onnxruntime.ai/docs/reference/api/c-api.html, but no way to set it from PythonIs it possible to preallocate a large GPU memory chunk at session start? We're seeing large efficiency difference depending on if previous batch had the same size. I wonder if we have allocator hiccups. Is there any way to debug allocations? Does verbose mode help?
Thanks!