pinokiofactory / cogstudio

248 stars 15 forks source link

Unable to Load Vocabulary from Tokenizer Model | models--THUDM--CogVideoX-2b and models--THUDM--CogVideoX-5b #26

Open Visual-Mistress opened 1 week ago

Visual-Mistress commented 1 week ago

I have been experiencing an issue with the loading of a tokenizer model for over two weeks now. Despite multiple attempts to reinstall both the neural network and the model, I am still encountering the same error. The error message states that it is unable to load the vocabulary from the specified file, indicating that the tokenizer model is either inaccessible or corrupted.

I have checked and ensured that all required libraries are installed and accessible, and the model files are present and correctly structured. I am running this on a system equipped with an NVIDIA GeForce RTX 4060 Ti with 16 GB of VRAM.

Despite following various troubleshooting steps and seeking solutions on forums, I have been unable to resolve the issue. I would greatly appreciate any guidance or suggestions on what might be causing this error.

Traceback (most recent call last): File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\gradio\queueing.py", line 624, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\gradio\route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\gradio\blocks.py", line 2018, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\gradio\blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\anyio_backends_asyncio.py", line 943, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\gradio\utils.py", line 846, in wrapper response = f(args, kwargs) ^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\gradio\utils.py", line 846, in wrapper response = f(*args, *kwargs) ^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\cogstudio.py", line 727, in generate latents, seed = infer( ^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\cogstudio.py", line 217, in infer init(name, image_input, video_input, dtype, full_gpu) File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\cogstudio.py", line 57, in init init_txt2vid(name, dtype_str, full_gpu) File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\cogstudio.py", line 84, in init_txt2vid dtype = init_core(name, dtype_str) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\cogstudio.py", line 67, in init_core pipe = CogVideoXPipeline.from_pretrained(name, torch_dtype=dtype).to(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\huggingface_hub\utils_validators.py", line 114, in _inner_fn return fn(args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 896, in from_pretrained loaded_sub_model = load_sub_model( ^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\diffusers\pipelines\pipeline_loading_utils.py", line 704, in load_sub_model loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\transformers\tokenization_utils_base.py", line 2213, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "D:\cogstudio\CogVideo\inference\gradio_composite_demo\env\Lib\site-packages\transformers\tokenization_utils_base.py", line 2462, in _from_pretrained raise OSError( OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.

Visual-Mistress commented 1 week ago

Я наконец сама нашла решение своей проблемы. Если у кого-то такая же проблема, вот решение:

  1. Саму нейросеть нужно устанавливать на тот диск, где в пути нет кириллицы (русского языка)

  2. Если у вас модели по умолчанию загружаются по такому пути: C:\Users[Имя пользователя на русском].cache\huggingface\hub\models--THUDM--CogVideoX-2

Тогда вам нужно перенести .cache на диск D:

Это установит переменную HF_HOME, чтобы указать, что Hugging Face должен использовать путь D:.cache\huggingface для кэша. Теперь при следующем запуске программ Hugging Face модели будут загружаться и искаться по новому пути на диске D И ошибка больше не появится.