m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.96k stars 1.26k forks source link

Reverting pyannote and torch versions recommended? #306

Open federerfanatic opened 1 year ago

federerfanatic commented 1 year ago

Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0. Bad things might happen unless you revert torch to 1.x.

Note I am running this in a conda environment under Ubuntu 22.04.

Seems to work with my versions.

Q. How do I distinguish speakers in the output files? It would nice if the output format would indicate this more clearly.

federerfanatic commented 1 year ago

the command line included --compute_type int8 --max_speakers 2

sorgfresser commented 1 year ago

Ignore the warnings, they are intended. Regarding the question: refer to the json maybe, there you got an extra key for each segment showing the speaker. I find the json easiest to work with and simply apply my own normalization afterwards.

gsyllas commented 1 month ago

Hello, did you solve this error?