thomasmol / cog-whisper-diarization

Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote
https://replicate.com/thomasmol/whisper-diarization
165 stars 51 forks source link

Whisper-v3 #1

Closed suryasanchez closed 11 months ago

suryasanchez commented 12 months ago

Update to support Whisper v3? https://github.com/openai/whisper/discussions/1762

thomasmol commented 12 months ago

Yes I am working on it. Currently there are issues converting whisper v3 to cttranslate https://github.com/guillaumekln/faster-whisper/issues/544#issuecomment-1802818862

thomasmol commented 11 months ago

Currently waiting on this PR to get merged: https://github.com/huggingface/transformers/pull/26699 so we can use batched inference and still get word level timestamps. I'll move away from faster-whisper (the maintainer is now working at Apple and not actively maintaining the repo anymore) and use the hugginface/transformer implementation described here: https://huggingface.co/openai/whisper-large-v3. Should result in even faster inference over faster-whisper. Hopefully somewhere this or next week!

thomasmol commented 11 months ago

Faster-whisper 0.10.0 was released, which includes support for whisper v3. Faster-whisper also has a new maintainer and moved here: https://github.com/SYSTRAN/faster-whisper. Just updated the pipeline to new version of faster-whisper and to new version of pyannote (3.1). Will still look into the batched inference when that's possible, but only when it supports VAD and word level timestamps.