huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.32k stars 238 forks source link

large-v2 for english lost voice to text #100

Open machenme opened 3 months ago

machenme commented 3 months ago

when I use large-v2 for voice2text, I meet some lost here is my data https://youtu.be/zYC7tKfKPtM?si=Rm2ZK9Rez5E9CSlP at 00:01:26 I lost all On the right-hand side, you put any expression that you want. this is distil-whisper-large-v2

00:01:20,750 --> 00:01:23,770
An assignment statement has a name on the left-hand side.

24
00:01:23,890 --> 00:01:25,430
It can be any name that you invent.

25
00:01:30,240 --> 00:01:35,780
will evaluate that expression and bind it to the name. So now radius is bound

26
00:01:35,780 --> 00:01:44,250
to the value 10. Two times radius is 20. I can use that name when I bind other

here is faster-whisper-large-v2

00:01:20,710 --> 00:01:23,770
An assignment statement has a name on the left-hand side.

24
00:01:23,930 --> 00:01:25,430
It could be any name that you invent.

25
00:01:26,200 --> 00:01:29,340
On the right-hand side, you put any expression that you want.

26
00:01:29,720 --> 00:01:31,720
And Python will evaluate that expression

27
00:01:31,720 --> 00:01:33,600
and bind it to the name.

28
00:01:34,300 --> 00:01:36,660
So now radius is bound to the value 10.
sanchit-gandhi commented 3 months ago

Could you try with distil-large-v3? It implements training improvements that should make it more performant in Faster-Whisper: https://huggingface.co/distil-whisper/distil-large-v3#faster-whisper