Huggingface's Fine Tuned model that can be used?

jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

MIT License

1.41k stars 164 forks source link

The models with preconfigured alignment heads or ones compatible with original heads will work. For the ones compatible with the original heads, you can manually config it by assigning the head indices to model._pipe.model.generation_config.alignment_heads.

Technically even models without alignment heads, such as distil-large-v2, will work as well by disabling word timestamps with model.transcribe('audio.mp3', word_timestamps=False). However, many features, such as regrouping and word-level timestamp adjustment, will be unavailable.

jianfch / stable-ts

Huggingface's Fine Tuned model that can be used? #378