m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.96k stars 1.26k forks source link

repetitive words in transcribe #576

Open MyraBaba opened 11 months ago

MyraBaba commented 11 months ago

I am processing hour wav as below. there is a few part that transcribed as repetitive words such as : [SPEAKER01] As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there

command is: whisperx /data/WORKS/RD/Rd.wav --lang xx --model large --model_dir MODELSx/ --device cuda --diarize --hf_token xxxxxxxxxxxxx

how Can I fix this ? is it possible ?

dmazurok commented 11 months ago

I am processing hour wav as below. there is a few part that transcribed as repetitive words such as : [SPEAKER01] As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there As in there

command is: whisperx /data/WORKS/RD/Rd.wav --lang xx --model large --model_dir MODELSx/ --device cuda --diarize --hf_token xxxxxxxxxxxxx

how Can I fix this ? is it possible ?

Get the same when suppress numerical tokens on segments with a lot of digits

MyraBaba commented 11 months ago

@dmazurok @prashanthellina

WhenI I cut the problematic part of the audio as 5 min part. Then its successfully transcribing. but in 4 hour audio its messing.

dmazurok commented 11 months ago

@dmazurok @prashanthellina

WhenI I cut the problematic part of the audio as 5 min part. Then its successfully transcribing. but in 4 hour audio its messing.

Even on shorter audio get it. around 40 minutes, the error itself starts even earlier. no matter gpu or cpu I will try later cut it just to test image

fznrs commented 1 month ago

Did you guys ever find a reason or solution for the issue? I'm experiencing the same thing.