Closed ItakeLs closed 1 year ago
Thanks a lot for reporting and giving the audio @ItakeLs I'll have a look asap.
Are you using the command line (CLI) or calling python function transcribe
?
(I think default option differ)
Also, to be sure what is your version (whisper_timestamped -v
in the CLI / whisper_timestamped.__version__
in python)?
And are you running on GPU or CPU? (it's unfortunate to ask this, but there seems to be a butterfly effect that make results significantly different on different devices)
"condition_on_previous_text" is true
Actually it's already True by default. Sure about this?
It's fine, the "condition_on_previous_text" is True, in the last issue I sent it was false so I put that for clarification, sorry for the confusion. I am using the transcribe
function, not the CLI. I am running on the latest version, it says version "1.7.5" using whisper_timestamped.__version__
. I am also running on GPU.
Edit: Just tried it again, did you make any changes regarding this? because I believe it working now at least on google colab, let me try it where I initially got the error.
Thanks for the feedback
Just tried it again, did you make any changes regarding this? because I believe it working now at least on google colab, let me try it where I initially got the error.
I fixed some issues in the meantime, but nothing related to the issue you saw (I think).
But it's possible that this issue appears very seldom in very strict conditions (e.g. hard to reproduce with a different GPU card...). So I'm interested to know if you are able to reproduce. For the moment I'm not.
Yeah, I think you fixed the issue, or at least it is not giving an error with that audio, I'll continue testing with different audio to see if I can reproduce it again.
I have this same problem, using the commit 0c4e015510089a3e42081c7cddb82931d7b4b5dd with GPU. It seems that the problem occurs only using .en models but not with multilingual models. I ran transcript with models tiny, tiny.en, medium and medium.en for set of 1000 wavs and got the 'inconsistent length for segment' error once with both .en models, but for different audio files.
Wow! Thanks a lot for this thorough investigation @solismaa
I have an idea how to fix this, but to be sure i have to find a way to reproduce. 1 chance / 1000 with en models, OK, it's gonna be challenging...
I have the same error with the latest version and multilingual models, running on GPU.
I've seen this too (with CPU/medium.en model). In my case --accurate helped, maybe that's worth a try.
I fixed two possible errors that could occur (especially with *.en models), but I never saw the exact same error as reported in the title of this issue.
So @solismaa @a-rogalska @misutoneko @Mike327327 and others who encounter the same issue, can you please (if you have time) retry with the last version, and if it fails:
Especially if you see the error on CPU, it will be easier for me to reproduce.
And really... it's impossible to fix an issue that I can't reproduce...
Yeah it still fails for me (I'm at 219699c). Please do note though, that this may be a separate issue as I haven't done testing with that youtube video, only with my own (mostly very short) clips. The error seems to happen if there is no actual speech in the wav file, and so far I've only seen it with medium.en (I think). Also, with small.en and tiny.en everything seems to go smoothly without errors, well with this sample at least.
Here's a sample and the corresponding log: clip_652.zip
$ /usr/local/bin/whisper_timestamped --threads 4 --language en --device cpu --output_format srt --model medium.en --output_dir . clip_652_132492_Inconsistent_number_of_segs.wav /usr/local/lib/python3.8/dist-packages/whisper/transcribe.py:77: UserWarning: Performing inference on CPU when CUDA is available warnings.warn("Performing inference on CPU when CUDA is available") 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 140/140 [01:23<00:00, 1.65frames/s] WARNING:whisper_timestamped:Inconsistent number of segments: whisper_segments (2) != timestamped_word_segments (1) Traceback (most recent call last): File "/usr/local/bin/whisper_timestamped", line 11, in <module> load_entry_point('whisper-timestamped==1.7.8', 'console_scripts', 'whisper_timestamped')() File "/usr/local/lib/python3.8/dist-packages/whisper_timestamped-1.7.8-py3.8.egg/whisper_timestamped/transcribe.py", line 1461, in cli File "/usr/local/lib/python3.8/dist-packages/whisper_timestamped-1.7.8-py3.8.egg/whisper_timestamped/transcribe.py", line 216, in transcribe_timestamped File "/usr/local/lib/python3.8/dist-packages/whisper_timestamped-1.7.8-py3.8.egg/whisper_timestamped/transcribe.py", line 635, in _transcribe_timestamped_efficient AssertionError: Inconsistent number of segments: whisper_segments (2) != timestamped_word_segments (1)
Thank you so much @misutoneko I could reproduce easily thanks to what you posted :) It's awesome that you could provide a short audio, which allow to reproduce both on CPU and on GPU.
This issue should be fixed now.
Hello again, I have some more reproducible errors for you. The error "Got start time outside of audio boundary" and "Inconsistent number of segments: whisper_segments (462) != timestamped_word_segments (461)"
This youtube video can reproduce the error: youtube I downloaded the mp4 file of the youtube video from here: youtube downloader
"condition_on_previous_text" is true and the rest of the parameters are default settings