m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.05k stars 1.27k forks source link

huggingface error #261

Open TokerX opened 1 year ago

TokerX commented 1 year ago

I get the following errors upon running whisperX

C:\Windows\System32>whisperx --model large-v2 --language nl "F:\Movies\Ad Fundum (1993)\Ad Fundum (1993).avi" --compute_type float32 The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.2. To apply the upgrade to your files permanently, runpython -m pytorch_lightning.utilities.upgrade_checkpoint --file C:\Users\svenc.cache\torch\whisperx-vad-segmentation.bin` Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cpu. Bad things might happen unless you revert torch to 1.x.

Performing transcription... [WinError 5] Access is denied: '..\..\blobs\bb3285bc209d674e3f88646bdfd327bfe43b60da' -> 'C:\Users\svenc\.cache\huggingface\hub\models--jonatasgrosman--wav2vec2-large-xlsr-53-dutch\snapshots\46f221381d200f7bef268309b3f02023ccf11fcc\preprocessor_config.json' Error loading model from huggingface, check https://huggingface.co/models for finetuned wav2vec2.0 models Traceback (most recent call last): File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\wav2vec2\processing_wav2vec2.py", line 51, in from_pretrained return super().from_pretrained(pretrained_model_name_or_path, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\processing_utils.py", line 184, in from_pretrained args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\processing_utils.py", line 228, in _get_arguments_from_pretrained args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\feature_extraction_utils.py", line 329, in from_pretrained feature_extractor_dict, kwargs = cls.get_feature_extractor_dict(pretrained_model_name_or_path, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\feature_extraction_utils.py", line 429, in get_feature_extractor_dict resolved_feature_extractor_file = cached_file( ^^^^^^^^^^^^ File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\hub.py", line 409, in cached_file resolved_file = hf_hub_download( ^^^^^^^^^^^^^^^^ File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\utils_validators.py", line 120, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py", line 1320, in hf_hub_download _create_symlink(blob_path, pointer_path, new_blob=False) File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py", line 911, in _create_symlink os.symlink(src_rel_or_abs, abs_dst) PermissionError: [WinError 5] Access is denied: '..\..\blobs\bb3285bc209d674e3f88646bdfd327bfe43b60da' -> 'C:\Users\svenc\.cache\huggingface\hub\models--jonatasgrosman--wav2vec2-large-xlsr-53-dutch\snapshots\46f221381d200f7bef268309b3f02023ccf11fcc\preprocessor_config.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Scripts\whisperx.exe__main__.py", line 7, in File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisperx\transcribe.py", line 166, in cli align_model, align_metadata = load_align_model(align_language, device, model_name=align_model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\svenc\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisperx\alignment.py", line 73, in load_align_model raise ValueError(f'The chosen align_model "{model_name}" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)') ValueError: The chosen align_model "jonatasgrosman/wav2vec2-large-xlsr-53-dutch" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)`

The last one talks about torchaudio as well, but seeing as everything else is about huggingface I guess that's where the problem is.

"PermissionError: [WinError 5] Access is denied: " makes it seem like a Windows thing or something, but I'm running as admin so it's not a question of having rights.

sorgfresser commented 1 year ago

The torchaudio one is intended on windows (since soundfile is the default one on windows). So this shouldn't be the issue. The raised ValueError is a bit misleading - it is indeed a permission error. Just to double check that it is not a permission error (admin should fix it, but I'm not too familiar with windows so I'm not certain): can you try to manually set the permissions of C:\Users\svenc\.cache\huggingface? Since it's failing on symlink creation, the other directory ..\..\blobs\bb3285bc209d674e3f88646bdfd327bfe43b60da could be the issue too. This is also located in the same Cache-Dir, so setting C:\Users\svenc\.cache\huggingface should be enough.