Failed to allocated memory for requested buffer of size X

So I was trying to deploy a custom model on the tritonserver(23.08) with the onnxruntime_backend(onnxruntime version 1.15.1). But while doing so, we are facing this issue:

onnx runtime error 6: Non-zero status code returned while running Mul node. Name:\'Mul_8702\' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:368 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 2830172160\

There are 7 other models also hosted on the the same server and those work fine(even under stress) but things break once this new model is added. Any idea why this might be happening? The server is also hosted in a T4 gpu and these are our current stats:

| model_control_mode               | MODE_NONE                                                                                                                                             │
│                                                            |                                                                                                                               │
│ | strict_model_config              | 0                                                                                                                                                     │
│                                                            |                                                                                                                               │
│ | rate_limit                       | OFF                                                                                                                                                   │
│                                                            |                                                                                                                               │
│ | pinned_memory_pool_byte_size     | 268435456                                                                                                                                             │
│                                                            |                                                                                                                               │
│ | cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                              │
│                                                            |                                                                                                                               │
│ | min_supported_compute_capability | 6.0                                                                                                                                                   │
│                                                            |                                                                                                                               │
│ | strict_readiness                 | 1                                                                                                                                                     │
│                                                            |                                                                                                                               │
│ | exit_timeout                     | 30                                                                                                                                                    │
│                                                            |                                                                                                                               │
│ | cache_enabled                    | 0                                                                                                                                                     │
│                                                            |

Any help on understanding why this might be caused and how to fix this will be appreciated Thanks!

triton-inference-server / onnxruntime_backend

Failed to allocated memory for requested buffer of size X #249