triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.39k stars 1.49k forks source link

Stateful decoupled bls model: malloc_consolidate(): unaligned fastbin chunk detected #7517

Open 007durgesh219 opened 3 months ago

007durgesh219 commented 3 months ago

Description I am getting memory corruption issues with stateful bls model, It seems like triton is trying to free some memory which is still in use

Triton Information 24.07

Are you using the Triton container or did you build it yourself? I used triton container 24.07

To Reproduce I have a stateful decoupled bls model(python backend) which is causing this issue. My bls model is just taking input putting it in an internal queue which will be consumed by a thread that calls the sequence of other models and finally returns the response of last model.

It runs for 1-2 mins then suddenly crash and model gets killed. There is no error logs other then following:

  1. model: malloc_consolidate(): unaligned fastbin chunk detected
  2. Reference count error detected: an attempt was made to deallocate the dtype 17 (O) I've attached logs with this issue.

The model input is coming from a streaming client, It sends audio chunks every 50 mili sec. BLS model will pass those chunks further to a set of other models sequentially and finally return output to the streaming client

Expected behavior Should not crash crashlog15.txt crashlog16.txt