huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

Is this supposed to support large-v3? #36

Closed captainyugi00 closed 2 months ago

captainyugi00 commented 7 months ago

Hello, is this model supporting large-v3?

sanchit-gandhi commented 7 months ago

Distilling large-v3 now! It's a ~2 week process if you include pseudo-labelling the data and training the model

salahzoubi commented 7 months ago

@sanchit-gandhi will batch transcribe be supported like it is supported in large v3?

rawwerks commented 7 months ago

Distilling large-v3 now! It's a ~2 week process if you include pseudo-labelling the data and training the model

This will change the game!

...Especially once @ggerganov gets his hands on it (for https://github.com/ggerganov/whisper.cpp )

mrubiottec commented 7 months ago

looking forward for large-v3 to be distilled! :)

timchenxiaoyu commented 6 months ago

wait

sanchit-gandhi commented 5 months ago

Running distil-large-v3 with some updates that will mean better long-form WER performance using OpenAI's long-form algorithm. This should also translate to WER improvements in other libraries like faster-whisper and Whisper cpp!

Largely speaking, the changes are:

  1. Freezing the decoder embeds (giving good WER improvements on long-form audio)
  2. Training on a longer seq len (256 vs 128 as we had before, giving WER improvements on short and long-form audio)
  3. Training directly on the token ids (see decode_token_ids here)

Training run logs: https://wandb.ai/sanchit-gandhi/distil-whisper?workspace=user-sanchit-gandhi

Mijawel commented 5 months ago

Is distil-large-v3 finished?

v3ss0n commented 5 months ago

When distill largev3?

hlevring commented 5 months ago

Good things take a little time ;)

sanchit-gandhi commented 5 months ago

I'd say ETA is about 1-2 weeks. Getting really promising long-form WER results now (within 1.3% WER of large-v3 using OpenAI's long-form transcription algorithm) - currently training the model so that it works with condition_on_prev_text

qwopqwop200 commented 5 months ago

Does Distill Large v3 support multilingual, or does it only support English?

AiDreamerOoO commented 4 months ago

我想说预计到达时间约为 1-2 周。现在获得了真正有希望的长格式 WER 结果(使用 OpenAI 的长格式转录算法,WER 在大型 v3 的 1.3% 以内) - 目前正在训练模型,以便它可以与condition_on_prev_text

加油

GeorgesLeYeti commented 4 months ago

Any news on largeV3 ?

MrRace commented 4 months ago

Wish

solaoi commented 2 months ago

great work! thx https://huggingface.co/distil-whisper/distil-large-v3