hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
33.25k stars 4.09k forks source link

使用unsloth推理时出现错误,训练时正常:does not have a {% if add_generation_prompt %} for generation purposes. #5213

Open quida01 opened 2 months ago

quida01 commented 2 months ago

unsloth版本:2024.8 llamafactory :0.8.4.dev0 模型:DeepSeek-Coder-1.3B, Gemma-2-2B transformers : 4.43.4 xformers: 0.0.27.post2 trl: 0.9.6 torch: 2.4.0 cuda: 12.1

系统:Ubuntu 22.04 显卡: NVIDIA A6000 48G

[INFO|modeling_utils.py:3641] 2024-08-19 20:43:57,693 >> loading weights file /media/models/models/google/gemma-2b/model.safetensors.index.json [INFO|modeling_utils.py:1572] 2024-08-19 20:43:57,694 >> Instantiating Gemma2ForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:1038] 2024-08-19 20:43:57,697 >> Generate config GenerationConfig { "bos_token_id": 2, "cache_implementation": "hybrid", "eos_token_id": 1, "pad_token_id": 0 }

Loading checkpoint shards: 100%|████████████████████████████████| 3/3 [00:59<00:00, 19.92s/it] [INFO|modeling_utils.py:4473] 2024-08-19 20:44:57,702 >> All model checkpoint weights were used when initializing Gemma2ForCausalLM.

[INFO|modeling_utils.py:4481] 2024-08-19 20:44:57,702 >> All the weights of Gemma2ForCausalLM were initialized from the model checkpoint at /media/models/models/google/gemma-2b. If your task is similar to the task the model of the checkpoint was trained on, you can already use Gemma2ForCausalLM for predictions without further training. [INFO|configuration_utils.py:991] 2024-08-19 20:44:57,705 >> loading configuration file /media/models/models/google/gemma-2b/generation_config.json [INFO|configuration_utils.py:1038] 2024-08-19 20:44:57,705 >> Generate config GenerationConfig { "bos_token_id": 2, "cache_implementation": "hybrid", "eos_token_id": 1, "pad_token_id": 0 }

Traceback (most recent call last): File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/queueing.py", line 575, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/route_utils.py", line 288, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/blocks.py", line 1931, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/blocks.py", line 1528, in call_function prediction = await utils.async_iteration(iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 671, in async_iteration return await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 664, in anext return await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 647, in run_sync_iterator_async return next(iterator) ^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 809, in gen_wrapper response = next(iterator) ^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/webui/chatter.py", line 107, in load_model super().init(args) File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/chat/chat_model.py", line 44, in init self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/chat/hf_engine.py", line 58, in init self.model = load_model( ^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/loader.py", line 162, in load_model model = init_adapter(config, model, model_args, finetuning_args, is_trainable) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/adapter.py", line 310, in init_adapter model = _setup_lora_tuning( ^^^^^^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/adapter.py", line 199, in _setup_lora_tuning model = load_unsloth_peft_model(config, model_args, is_trainable=is_trainable) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/model_utils/unsloth.py", line 95, in load_unsloth_peftmodel model, = FastLanguageModel.from_pretrained(**unsloth_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/models/loader.py", line 301, in from_pretrained model, tokenizer = dispatch_model.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/models/llama.py", line 1412, in from_pretrained tokenizer = load_correct_tokenizer( ^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 563, in load_correct_tokenizer chat_template = fix_chat_template(tokenizer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 638, in fix_chat_template raise RuntimeError( RuntimeError: Unsloth: The tokenizer saves/Gemma-2-2B/lora/train_2024-08-19-20-11-09 does not have a {% if add_generation_prompt %} for generation purposes. Please file a bug report immediately - thanks!

GSalimp commented 2 months ago

Any updates on this? Been having the same issue with all the models i try to load

frei-x commented 1 month ago

does not have a {% if add_generation_prompt %} for generation purposes. Please file a bug report immediately - thanks!