"input with shape (1, 80, 3000)" after updating to latest version

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD 2-Clause "Simplified" License

11.96k stars 1.26k forks source link

"input with shape (1, 80, 3000)" after updating to latest version #611

Open iAladeen opened 10 months ago

iAladeen commented 10 months ago

After updating to the latest version. i get this error when transcribing using whisperx large-v3. Can someone help me fix

Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x.

the last part says

ValueError: Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead

sogris commented 10 months ago

After reinstal whisperx using downloaded package "ValueError: Invalid input features shape" was resolved

$ git clone https://github.com/m-bain/whisperX.git
$ cd whisperX
$ pip install -e .

rrfaria commented 10 months ago

Same issue here. Even reinstalling all again

arbianqx commented 10 months ago

I'm encountering the same issue. Any update on this?

iAladeen commented 10 months ago

I'm encountering the same issue. Any update on this?

Try this command:

pip install --upgrade --force-reinstall --no-deps git+https://github.com/m-bain/whisperx.git

zillionare commented 1 week ago

this happens to distilled large-v3 too. https://huggingface.co/Systran/faster-distil-whisper-large-v3.

upgrade whisperx will not solve the issue.