jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.61k stars 178 forks source link

ValueError: Expected parameter logits Error when using VAD #118

Closed acul3 closed 1 year ago

acul3 commented 1 year ago

got error when using vad

it work fine when vad is not included

import stable_whisper

model = stable_whisper.load_model('large-v2')
# this modified model run just like the original model but accepts additional arguments
result = model.transcribe('/content/drive/MyDrive/meet2.wav',language="id",vad=True)

here the sample audio:

https://drive.google.com/file/d/1AMidkSBtZdDFk34Jh7pvNQNJ9W5Je36e/view?usp=share_link

the error:

ValueError: Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0')
jianfch commented 1 year ago

Should be fixed in the latest version.

acul3 commented 1 year ago

i can confirm it . it solve

thanks