Loading checkpoint shards: 100%|████████████████████████████████| 3/3 [00:59<00:00, 19.92s/it]
[INFO|modeling_utils.py:4473] 2024-08-19 20:44:57,702 >> All model checkpoint weights were used when initializing Gemma2ForCausalLM.
[INFO|modeling_utils.py:4481] 2024-08-19 20:44:57,702 >> All the weights of Gemma2ForCausalLM were initialized from the model checkpoint at /media/models/models/google/gemma-2b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Gemma2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-19 20:44:57,705 >> loading configuration file /media/models/models/google/gemma-2b/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-19 20:44:57,705 >> Generate config GenerationConfig {
"bos_token_id": 2,
"cache_implementation": "hybrid",
"eos_token_id": 1,
"pad_token_id": 0
}
Traceback (most recent call last):
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/queueing.py", line 575, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/route_utils.py", line 288, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/blocks.py", line 1931, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/blocks.py", line 1528, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 671, in async_iteration
return await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 664, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 647, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 809, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/webui/chatter.py", line 107, in load_model
super().init(args)
File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/chat/chat_model.py", line 44, in init
self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/chat/hf_engine.py", line 58, in init
self.model = load_model(
^^^^^^^^^^^
File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/loader.py", line 162, in load_model
model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/adapter.py", line 310, in init_adapter
model = _setup_lora_tuning(
^^^^^^^^^^^^^^^^^^^
File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/adapter.py", line 199, in _setup_lora_tuning
model = load_unsloth_peft_model(config, model_args, is_trainable=is_trainable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/model_utils/unsloth.py", line 95, in load_unsloth_peftmodel
model, = FastLanguageModel.from_pretrained(**unsloth_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/models/loader.py", line 301, in from_pretrained
model, tokenizer = dispatch_model.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/models/llama.py", line 1412, in from_pretrained
tokenizer = load_correct_tokenizer(
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 563, in load_correct_tokenizer
chat_template = fix_chat_template(tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 638, in fix_chat_template
raise RuntimeError(
RuntimeError: Unsloth: The tokenizer saves/Gemma-2-2B/lora/train_2024-08-19-20-11-09
does not have a {% if add_generation_prompt %} for generation purposes.
Please file a bug report immediately - thanks!
unsloth版本:2024.8 llamafactory :0.8.4.dev0 模型:DeepSeek-Coder-1.3B, Gemma-2-2B transformers : 4.43.4 xformers: 0.0.27.post2 trl: 0.9.6 torch: 2.4.0 cuda: 12.1
系统:Ubuntu 22.04 显卡: NVIDIA A6000 48G
[INFO|modeling_utils.py:3641] 2024-08-19 20:43:57,693 >> loading weights file /media/models/models/google/gemma-2b/model.safetensors.index.json [INFO|modeling_utils.py:1572] 2024-08-19 20:43:57,694 >> Instantiating Gemma2ForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:1038] 2024-08-19 20:43:57,697 >> Generate config GenerationConfig { "bos_token_id": 2, "cache_implementation": "hybrid", "eos_token_id": 1, "pad_token_id": 0 }
Loading checkpoint shards: 100%|████████████████████████████████| 3/3 [00:59<00:00, 19.92s/it] [INFO|modeling_utils.py:4473] 2024-08-19 20:44:57,702 >> All model checkpoint weights were used when initializing Gemma2ForCausalLM.
[INFO|modeling_utils.py:4481] 2024-08-19 20:44:57,702 >> All the weights of Gemma2ForCausalLM were initialized from the model checkpoint at /media/models/models/google/gemma-2b. If your task is similar to the task the model of the checkpoint was trained on, you can already use Gemma2ForCausalLM for predictions without further training. [INFO|configuration_utils.py:991] 2024-08-19 20:44:57,705 >> loading configuration file /media/models/models/google/gemma-2b/generation_config.json [INFO|configuration_utils.py:1038] 2024-08-19 20:44:57,705 >> Generate config GenerationConfig { "bos_token_id": 2, "cache_implementation": "hybrid", "eos_token_id": 1, "pad_token_id": 0 }
Traceback (most recent call last): File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/queueing.py", line 575, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/route_utils.py", line 288, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/blocks.py", line 1931, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/blocks.py", line 1528, in call_function prediction = await utils.async_iteration(iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 671, in async_iteration return await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 664, in anext return await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 647, in run_sync_iterator_async return next(iterator) ^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/gradio/utils.py", line 809, in gen_wrapper response = next(iterator) ^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/webui/chatter.py", line 107, in load_model super().init(args) File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/chat/chat_model.py", line 44, in init self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/chat/hf_engine.py", line 58, in init self.model = load_model( ^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/loader.py", line 162, in load_model model = init_adapter(config, model, model_args, finetuning_args, is_trainable) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/adapter.py", line 310, in init_adapter model = _setup_lora_tuning( ^^^^^^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/adapter.py", line 199, in _setup_lora_tuning model = load_unsloth_peft_model(config, model_args, is_trainable=is_trainable) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/media/models/model-tools/LLaMA-Factory-main7/src/llamafactory/model/model_utils/unsloth.py", line 95, in load_unsloth_peftmodel model, = FastLanguageModel.from_pretrained(**unsloth_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/models/loader.py", line 301, in from_pretrained model, tokenizer = dispatch_model.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/models/llama.py", line 1412, in from_pretrained tokenizer = load_correct_tokenizer( ^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 563, in load_correct_tokenizer chat_template = fix_chat_template(tokenizer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/llama-board7/lib/python3.11/site-packages/unsloth/tokenizer_utils.py", line 638, in fix_chat_template raise RuntimeError( RuntimeError: Unsloth: The tokenizer
saves/Gemma-2-2B/lora/train_2024-08-19-20-11-09
does not have a {% if add_generation_prompt %} for generation purposes. Please file a bug report immediately - thanks!