m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.29k stars 1.18k forks source link

torch 1.10.0+cu102, yours is 2.0.0 #822

Open madey83 opened 2 months ago

madey83 commented 2 months ago

(whisperx) C:\Users\lukas>whisperx --language en C:\Users\lukas\audio.mka torchvision is not available - cannot save figures Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.3.0. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\lukas\.cache\torch\whisperx-vad-segmentation.bin Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0. Bad things might happen unless you revert torch to 1.x.

Performing transcription... Traceback (most recent call last): File "C:\Users\lukas\miniconda3\envs\whisperx\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\lukas\miniconda3\envs\whisperx\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\lukas\miniconda3\envs\whisperx\Scripts\whisperx.exe__main.py", line 7, in File "C:\Users\lukas\miniconda3\envs\whisperx\lib\site-packages\whisperx\transcribe.py", line 176, in cli result = model.transcribe(audio, batch_size=batch_size, chunk_size=chunk_size, print_progress=print_progress) File "C:\Users\lukas\miniconda3\envs\whisperx\lib\site-packages\whisperx\asr.py", line 218, in transcribe for idx, out in enumerate(self.call(data(audio, vad_segments), batch_size=batch_size, num_workers=num_workers)): File "C:\Users\lukas\miniconda3\envs\whisperx\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in next__ item = next(self.iterator) File "C:\Users\lukas\miniconda3\envs\whisperx\lib\site-packages\transformers\pipelines\pt_utils.py", line 125, in next processed = self.infer(item, self.params) File "C:\Users\lukas\miniconda3\envs\whisperx\lib\site-packages\transformers\pipelines\base.py", line 1112, in forward model_outputs = self._forward(model_inputs, forward_params) File "C:\Users\lukas\miniconda3\envs\whisperx\lib\site-packages\whisperx\asr.py", line 152, in _forward outputs = self.model.generate_segment_batched(model_inputs['inputs'], self.tokenizer, self.options) File "C:\Users\lukas\miniconda3\envs\whisperx\lib\site-packages\whisperx\asr.py", line 47, in generate_segment_batched encoder_output = self.encode(features) File "C:\Users\lukas\miniconda3\envs\whisperx\lib\site-packages\whisperx\asr.py", line 86, in encode return self.model.encode(features, to_cpu=to_cpu) RuntimeError: Library cublas64_12.dll is not found or cannot be loaded

burnedsyn commented 1 month ago

Hi, i got the same problem under debian 12, i followed exactly the process, and get this error as i see until now i got the right text (i'm in french) but at each run i got these messages also about the checkpoint, and pytorch version which differ from the one on the training model. the one thing which differ from the tutorial on the readme is the line for cuda on cpu so i used this one conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 cpuonly -c pytorch

instead of this one

conda install pytorch==2.0.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia

Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0. Bad things might happen unless you revert torch to 1.x.

Performing transcription... Performing alignment...

Also i have used the pytorch line for cpu only cuda and set the 8bit for output there is my command line had been try with model v1 v2 v3 exact same problem with all of them.

(whisperx) goodmoodgoodlearning@panel:~/public_html/wp-content/uploads$ whisperx --model large-v3 --compute_type int8 --language fr input.mp4

my system is debian 12 on Intel(R) Xeon(R) CPU E3-1270 v6 8 Cores 32Gb Ram at the end of the day i got the different files from whispex txt.svt, etc. and i can open them in any editor there is just one thing if i try to use less to see them it says it's a binary file open it but all accentuated letter are white and not shown at the screen, i thing utf-8 or something values is missing (i have just see that now so i'll search)

so if somebody have an idea from where it originate or how to make the system full happy to work with no such error.

have a nice day Tim