ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation
MIT License
1.78k stars 220 forks source link

OpenAI Whisper is not working anymore as a backend for whisper_streaming #94

Open parth-singh71 opened 3 months ago

parth-singh71 commented 3 months ago

OpenAI Whisper API response has been updated and it no longer gives the "words" key in the response due to which whisper_streaming fails to run.

Please let me know if you are planing to update the code for new API responses and if yes by when can we expect this release?

Thanks in advance

Gldkslfmsd commented 3 months ago

the link you send has "Verbose Json" section where the word timestamps are included.

Gldkslfmsd commented 3 months ago

no plan ahead. PR is welcome. Thanks!

parth-singh71 commented 3 months ago

https://github.com/ufal/whisper_streaming/issues/94#issuecomment-2142751355

the link you send has "Verbose Json" section where the word timestamps are included.

Yes I have tried that, it does not work. Just try running that API via openai module and you won't get "words" key anymore, you will get "tokens" key instead which is basically the word embeddings of the words used.

marcinmatys commented 3 months ago

#94 (comment)

the link you send has "Verbose Json" section where the word timestamps are included.

Yes I have tried that, it does not work. Just try running that API via openai module and you won't get "words" key anymore, you will get "tokens" key instead which is basically the word embeddings of the words used.

To get words we need response_format="verbose_json", timestamp_granularities=["word"]

parth-singh71 commented 3 months ago

#94 (comment)

the link you send has "Verbose Json" section where the word timestamps are included.

Yes I have tried that, it does not work. Just try running that API via openai module and you won't get "words" key anymore, you will get "tokens" key instead which is basically the word embeddings of the words used.

To get words we need response_format="verbose_json", timestamp_granularities=["word"]

timestamp_granularities is only available for transcription API, what about translation API?

I am using whisper_streaming for translation task.

kennette21 commented 2 months ago

I also just encountered the timestamp_granularities issue with translation. I am also not clear how to set target language for translation. Is it always translating back to English as the target language?

Gldkslfmsd commented 2 months ago

yes, to English. Sometimes to other lang but it's an undocumented feature and is low quality.

Thomas Bean @.***> schrieb am So. 14. 7. 2024 um 05:24:

I also just encountered the timestamp_granularities issue with translation. I am also not clear how to set target language for translation. Is it always translating back to English as the target language?

— Reply to this email directly, view it on GitHub https://github.com/ufal/whisper_streaming/issues/94#issuecomment-2227278979, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIRQXKHEVBBEBRIUIHTCITZMI7UVAVCNFSM6AAAAABIS24ABKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRXGI3TQOJXHE . You are receiving this because you commented.Message ID: @.***>