k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.7k stars 430 forks source link

Memory Leak #1571

Closed janjanusek closed 1 hour ago

janjanusek commented 4 hours ago

Hi there, I found very ugly bug, that no matter which offline transcription model I use appears.

I split audio into speech segments and try to apply transcription on it after a while RAM consumption explodes from 2GBs up to 8GBs and above image When I dispose model it's all good again so it looks something is gathered in there?. (just a thought)

I'm soon in PROD and this is crucial part so I hope you can find time to check it out as I'm not fluent in C API.

I'm using latest version of sherpa onnx. Mostly using Zipformer and Moonshine to give you referrence from C# API. Yes, I'm properly disposing stream every time, and no matther count of threads it happens always image

csukuangfj commented 4 hours ago

What is the max length among the segments?

csukuangfj commented 4 hours ago

also, could you share all.the code

or is it possible to reproduce it with our examples?

janjanusek commented 2 hours ago

Okay you're rigt I'm starting to see very close relationship between max input size and memory consumption. But Why? can I turn it off? At 31 second chunk it was consuming over 3GBs when it was over minute long it jumped to 7.9GB.

Can you explain to me how it works under the hood so I can addapt at least?

csukuangfj commented 1 hour ago

please see our vad + asr example to handle long wave files.

you are not supposed to input long waves to any non-streaming models.

csukuangfj commented 1 hour ago

closing since it is not a bug