triton-inference-server / onnxruntime_backend

The Triton backend for the ONNX Runtime.
BSD 3-Clause "New" or "Revised" License
134 stars 57 forks source link

Failed to allocated memory for requested buffer of size X #249

Open aaditya-srivathsan opened 8 months ago

aaditya-srivathsan commented 8 months ago

So I was trying to deploy a custom model on the tritonserver(23.08) with the onnxruntime_backend(onnxruntime version 1.15.1). But while doing so, we are facing this issue:

onnx runtime error 6: Non-zero status code returned while running Mul node. Name:\'Mul_8702\' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2830172160\

There are 7 other models also hosted on the the same server and those work fine(even under stress) but things break once this new model is added. Any idea why this might be happening? The server is also hosted in a T4 gpu and these are our current stats:

| model_control_mode               | MODE_NONE                                                                                                                                             │
│                                                            |                                                                                                                               │
│ | strict_model_config              | 0                                                                                                                                                     │
│                                                            |                                                                                                                               │
│ | rate_limit                       | OFF                                                                                                                                                   │
│                                                            |                                                                                                                               │
│ | pinned_memory_pool_byte_size     | 268435456                                                                                                                                             │
│                                                            |                                                                                                                               │
│ | cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                              │
│                                                            |                                                                                                                               │
│ | min_supported_compute_capability | 6.0                                                                                                                                                   │
│                                                            |                                                                                                                               │
│ | strict_readiness                 | 1                                                                                                                                                     │
│                                                            |                                                                                                                               │
│ | exit_timeout                     | 30                                                                                                                                                    │
│                                                            |                                                                                                                               │
│ | cache_enabled                    | 0                                                                                                                                                     │
│                                                            |         

Any help on understanding why this might be caused and how to fix this will be appreciated Thanks!

DataXujing commented 1 month ago

I have the same issue!