CUDA memory increasing and process freeze [Performance]

Describe the issue

In production I run long-t5 model for data procesing, tried using onnxruntime-gpu 1.19.0. I run 3 processes on the same instances, which share GPU resources, but all processes kinda freeze after gradual GPU memory increase. In nvidia-smi I saw a processes using some GPU memory (not all), but application logs just stopped. Rolled back to onnxruntime to 1.18.0, which works fine. Current dependencies do not allow to upgrade to 1.20.0. I know that sharing GPU between processes may not be the best practice, but this is cost efficient and worked until now.

Any ideas what could be eating up the memory?

To reproduce

The model I use: https://huggingface.co/agemagician/mlong-t5-tglobal-large

Urgency

No response

Platform

Linux

OS Version

Amazon Linux AMI 2.0.20230606 x86_64

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.19.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.8

Model File

No response

microsoft / onnxruntime