Open Patrick10731 opened 6 days ago
The models with preconfigured alignment heads or ones compatible with original heads will work.
For the ones compatible with the original heads, you can manually config it by assigning the head indices to model._pipe.model.generation_config.alignment_heads
.
Technically even models without alignment heads, such as distil-large-v2
, will work as well by disabling word timestamps with model.transcribe('audio.mp3', word_timestamps=False)
. However, many features, such as regrouping and word-level timestamp adjustment, will be unavailable.
I tryed to use distil-whisper-v3 in stable-ts and it can be used. However, it's unable to be used when I try to use "distil-large-v2". Other model can't be used too.(ex:kotoba-whisper,"kotoba-tech/kotoba-whisper-v1.0") What kind of model can be used in stable-ts except for OpenAI's model?
import stable_whisper
model = stable_whisper.load_hf_whisper('distil-whisper/distil-large-v3', device='cpu') result = model.transcribe('audio.mp3')
result.to_srt_vtt('audio.srt', word_level=False)