Memory allocation failures due to incorrect requested buffer size

OvervCW commented 9 months ago

Describe the issue

I'm using NVIDIA Triton to perform inference on various detection models using the onnxruntime and this has always worked fine, but once I upgraded from version 1.13.1 of the onnxruntime to 1.16.0, I started occasionally getting errors like this:

2023-12-07 10:14:43.214806225 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_5/model_4/model_2/res2b_branch2c/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 13622061778317179392 2023-12-07 10:14:50.385533092 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_7/model_6/res3d_branch2a/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 4375604526317872384 2023-12-07 10:14:51.047598146 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'model_73/model_72/res2a_branch1/Conv2D' Status Message: /workspace/onnxruntime/onnxruntime/core/framework/bfc_arena.cc:376 void onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 13671442273597750016

The request buffer sizes are obviously completely wrong, so this seems like a bug in the onnxruntime or Triton, but I don't know enough about their interactions to be able to tell where exactly this is coming from.

I should note that it is not reliably reproducible. It doesn't happen for every inference request and restarting the inference server is sometimes enough to fix the problem, without changing anything about the models or version of Triton/onnxruntime.

To reproduce

I've only been able to reproduce this with 2 specific models so far, both CenterNet detection models.

Urgency

Medium. We upgraded to a new version of Triton/onnxruntime to fix minor issues with some other models, so we'd prefer to not have to downgrade.

Platform

Linux

OS Version

Ubuntu 22.04.3 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.2.2

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

OvervCW commented 8 months ago

It is definitely still an issue we are dealing with.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

OvervCW commented 7 months ago

Still an issue.

microsoft / onnxruntime