kadirnar / whisper-plus

WhisperPlus: Faster, Smarter, and More Capable šŸš€
Apache License 2.0
1.67k stars 133 forks source link

TypeError: unsupported operand type(s) for -: 'NoneType' and 'float' #119

Open IzzyHibbert opened 1 month ago

IzzyHibbert commented 1 month ago

I am using:

WhisperPlus v0.3.0 Cuda 12.1 It throws the following error :

Traceback (most recent call last):
  File "D:\02-GENAI\whisper-plus\App_Diarization.py", line 17, in <module>
    output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\02-GENAI\whisper-plus\whisperplus\pipelines\whisper_diarize.py", line 171, in __call__
    upto_idx = np.argmin(np.abs(end_timestamps - end_time))
                                ~~~~~~~~~~~~~~~^~~~~~~~~~
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'

When I use this request (aligned with repo instructions)....

from whisperplus.pipelines.whisper_diarize import ASRDiarizationPipeline
from whisperplus import download_youtube_to_mp3, format_speech_to_dialogue

audio_path = download_youtube_to_mp3("https://www.youtube.com/watch?v=3p-VBQ4l-4s") 

device = "cuda"  # cuda / cpu / mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization-3.1",
    #use_auth_token=False,
    use_auth_token="here_comes_my_token",
    chunk_length_s=30,
    device=device,
)
output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)

Any idea ?

kadirnar commented 1 month ago

Didn't download the mp3 file. Is the link correct?

IzzyHibbert commented 1 month ago

Didn't download the mp3 file. Is the link correct?

Yes, works. Even a copy+paste in browser works for me.

kadirnar commented 1 month ago

Sometimes there is a problem while downloading. That's why you have to try again and again.

IzzyHibbert commented 1 month ago

Sometimes there is a problem while downloading. That's why you have to try again and again.

No I had my file downloaded. The error occurs after, you can see that it fails in the speakers line

whisper_diarize.py

line 171

IzzyHibbert commented 1 month ago

Also, I made another video to work, using CPU, same machine (sonot an issue with the installation to me)

kadirnar commented 1 month ago

Can you give the path of the .mp3 file to the audio_pah variable? Because it cannot read the audio file. It returns to None.

IzzyHibbert commented 1 month ago

Can you give the path of the .mp3 file to the audio_pah variable? Because it cannot read the audio file. It returns to None.

I will try early tomorrow and let you know, thanks

IzzyHibbert commented 1 month ago

Ok, I founded out. It turned that the solution is not able to process the video (audio) I used :

https://www.youtube.com/watch?v=3p-VBQ4l-4s

I tried both with CUDA and CPU. Both KO. As soon as in the same installation I use another video (audio) it works fine.

My 2cents: the only possible cause I see for this case is a bug when the video (audio) includes few notes of a jingle at the beginning of the audio and no voices (which just come seconds after..

risedangel commented 1 month ago

To my opinion this occurs when the code cannot produce a time stamp for the last element of "end_timestamps"

GuyPaddock commented 1 month ago

Also seeing this, and @risedangel seems to be correct:

Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.
Traceback (most recent call last):
  File "/mnt/array1/home/guyep/playground/whisper-plus/diarize.py", line 18, in <module>
    output_text = pipeline(audio_path, num_speakers=16)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/array1/home/guyep/miniconda3/envs/whisper-plus/lib/python3.12/site-packages/whisperplus/pipelines/whisper_diarize.py", line 171, in __call__
    upto_idx = np.argmin(np.abs(end_timestamps - end_time))
                                ~~~~~~~~~~~~~~~^~~~~~~~~~
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'