Open Fledermaus-20 opened 1 week ago
You didn't share any timings, so it's hard to say what's going on. Note that streaming inference may take longer in total than normal inference, just the first chunks should arrive faster. Could you share (once models are loaded/warmed up):
Sorry, I had forgotten, the times after the model is loaded/warmed up are:
Describe the bug
When running tests against the TTS endpoint, I've observed that streaming the audio response takes nearly the same amount of time as receiving a fully generated audio file. This seems counterintuitive, as streaming should typically deliver the response faster, starting with the first available data chunk. Below are the code for the streaming endpoint
To Reproduce
model_manager.py
tts_streaming.py
main.py
Expected behavior
The behavior I'm expecting is that I get the tts stream back much sooner than if I request the finished file.
Logs
No response
Environment
Additional context
Thanks for the help here in advance