jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 177 forks source link

Cuda out of memory error #197

Closed furqan4545 closed 1 year ago

furqan4545 commented 1 year ago
image

So I am using large-v2 model and when I set these parameters which I am showing in the picture. The cuda out of memory error is thrown, is there any leakage or something wrong?

Also the transcription is not as accurate as original whisper model.

jianfch commented 1 year ago

ts_num was used for version 1.0 of Stable-ts to improve the timing, but it's experimental in the current version 2.0. The higher the value you use, the more memory it will use. Generally, avoid using that argument for best results.

furqan4545 commented 1 year ago

Thank you so much for response. result2 = model.transcribe('tate_pier.mp3', mel_first=True,demucs=True) result2 = model.transcribe('tate_pier.mp3', mel_first=True)

I used different settings as shown above. Also with VAD and without VAD. the accuracy is not as great as original whisper. It is missing alot of words sometime and I am using large-v2 model but still... Could you please tell me if there is any specific parameter which I can use so that it doesn't miss the words. Your help will be highly appreciated.