[question] Configure GPU arena with Python bindings

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.11k stars 2.84k forks source link

[question] Configure GPU arena with Python bindings #6411

Open vadimkantorov opened 3 years ago

vadimkantorov commented 3 years ago

How to configure pre-allocation of GPU memory with Python bindings?

I found initial_chunk_size_bytes detailed at https://www.onnxruntime.ai/docs/reference/api/c-api.html, but no way to set it from Python

Is it possible to preallocate a large GPU memory chunk at session start? We're seeing large efficiency difference depending on if previous batch had the same size. I wonder if we have allocator hiccups. Is there any way to debug allocations? Does verbose mode help?

Thanks!

guoyu-wang commented 3 years ago

Please take a look at this

https://github.com/microsoft/onnxruntime/blob/60c772e2bc3a48af5621c895558038e764a51fd3/onnxruntime/test/python/onnxruntime_test_python.py#L829-L832

vadimkantorov commented 3 years ago

Would it work for GPU? Does this allocator name carry any semantics? I could not found an example for "gpu" or "cuda".

What about memory allocation tracing/logging? Is there such a capability?

Thanks!

pranavsharma commented 3 years ago

Yes, the cuda allocators do use the BFCArena to allocate a large chunk. See https://github.com/microsoft/onnxruntime/blob/cb8d8464bc6ee39a894ecc76e05574095f5eb489/onnxruntime/core/providers/cuda/cuda_execution_provider.cc#L75-L83 Have you tried using nvprof for debugging?

vadimkantorov commented 3 years ago

nvprof is a good idea. thanks!

What is current behaviour of standard CUDA execution provider InferenceSession.run wrt allocations? Will it cache them between the runs? What are conditions of eviction? Are there any tweaks of the CUDA allocator? (e.g. to configure chunk size). My setup is varying-time batches for seq2seq (of course, we'll be working around by padding up to fixed-time batch shapes, but having proper allocator controls for inference would be good)

zengjie617789 commented 1 year ago

I followed the reference code about gpu meomory allocator, which is below :

        expected_kvp_allocator = {
            "max_mem": 16,
            "arena_extend_strategy": 1,
            "initial_chunk_size_bytes": 10,
            "max_dead_bytes_per_chunk": 4,
            "initial_growth_chunk_size_bytes": 2,
        }

        # ort_arena_cfg_kvp = onnxruntime.OrtArenaCfg(8,0,4,2)
        ort_arena_cfg_kvp = onnxruntime.OrtArenaCfg(expected_kvp_allocator)

However， It seems not work to the code running.