Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00, 2.70s/it]
[INFO|modeling_utils.py:4170] 2024-05-08 20:23:43,299 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4178] 2024-05-08 20:23:43,300 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /mnt/workspace/public_models/gradientai/Llama-3-8B-Instruct-Gradient-1048k.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-05-08 20:23:43,304 >> loading configuration file /mnt/workspace/public_models/gradientai/Llama-3-8B-Instruct-Gradient-1048k/generation_config.json
[INFO|configuration_utils.py:928] 2024-05-08 20:23:43,304 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
[INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,177 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,177 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,177 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,177 >> loading file tokenizer_config.json
[WARNING|logging.py:314] 2024-05-08 20:23:53,456 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,459 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,459 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,459 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,459 >> loading file tokenizer_config.json
[WARNING|logging.py:314] 2024-05-08 20:23:53,706 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING|logging.py:329] 2024-05-08 20:23:53,723 >> /mnt/workspace/public_models/gradientai/Llama-3-8B-Instruct-Gradient-1048k does not have a padding token! Will use pad_token = <|reserved_special_token250|>.
05/08/2024 20:23:54 - INFO - llmtuner.model.utils.checkpointing - Gradient checkpointing enabled.
05/08/2024 20:23:54 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
[WARNING|logging.py:329] 2024-05-08 20:23:54,268 >> Unsloth cannot patch MLP layers with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
[WARNING|logging.py:329] 2024-05-08 20:23:54,268 >> Unsloth cannot patch Attention layers with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
[WARNING|logging.py:329] 2024-05-08 20:23:54,268 >> Unsloth cannot patch O projection layer with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
[WARNING|logging.py:329] 2024-05-08 20:23:54,269 >> Unsloth 2024.4 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers.
05/08/2024 20:23:54 - INFO - llmtuner.model.loader - trainable params: 3407872 || all params: 8033669120 || trainable%: 0.0424
[INFO|trainer.py:626] 2024-05-08 20:23:54,287 >> Using auto half precision backend
05/08/2024 20:23:54 - WARNING - llmtuner.extras.callbacks - Previous trainer log in this folder will be deleted.
[WARNING|logging.py:329] 2024-05-08 20:23:54,418 >> ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 2,376 | Num Epochs = 5
O^O/ \/ \ Batch size per device = 1 | Gradient Accumulation steps = 4
\ / Total batch size = 4 | Total steps = 2,970
"-__-" Number of trainable parameters = 3,407,872
0%| | 0/2970 [00:00<?, ?it/sTraceback (most recent call last):
File "/mnt/workspace/LLM/LLaMA-Factory/src/train.py", line 14, in
main()
File "/mnt/workspace/LLM/LLaMA-Factory/src/train.py", line 5, in main
run_exp()
File "/mnt/workspace/LLM/LLaMA-Factory/src/llmtuner/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/mnt/workspace/LLM/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/home/pai/envs/llama/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "", line 361, in _fast_inner_training_loop
File "/home/pai/envs/llama/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(inputs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call__
return convert_to_fp32(self.model_forward(*args, *kwargs))
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 789, in convert_to_fp32
return recursively_apply(_convert_to_fp32, tensor, test_type=_is_fp16_bf16_tensor)
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 118, in recursively_apply
{
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 119, in
k: recursively_apply(
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 126, in recursively_apply
return func(data, args, **kwargs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 781, in _convert_to_fp32
return tensor.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 48.93 GiB. GPU
0%|
Reminder
Reproduction
参考https://github.com/hiyouga/LLaMA-Factory/wiki/Performance-comparison,使用 llama 8B的模型训练,但是显存占用和文档差异比较大,单卡环境cutoff_len =65536基本80G占满,cutoff_len=100k就OOM。问下有什么问题吗? 运行参数: python src/train.py \ --stage sft \ --do_train \ --model_name_or_path gradientai/Llama-3-8B-Instruct-Gradient-1048k \ --dataset summary_train \ --template llama3 \ --cutoff_len 102400 \ --finetuning_type lora \ --lora_target q_proj,v_proj \ --output_dir output_models/sft \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 200 \ --learning_rate 1e-5 \ --num_train_epochs 5.0 \ --plot_loss \ --fp16 \ --flash_attn fa2\ --shift_attn \ --use_unsloth \ --quantization_bit 4 \ --overwrite_output_dir
输出: ... [INFO|modeling_utils.py:1494] 2024-05-08 20:23:20,149 >> Instantiating LlamaForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:928] 2024-05-08 20:23:20,150 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 }
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00, 2.70s/it] [INFO|modeling_utils.py:4170] 2024-05-08 20:23:43,299 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4178] 2024-05-08 20:23:43,300 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /mnt/workspace/public_models/gradientai/Llama-3-8B-Instruct-Gradient-1048k. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:881] 2024-05-08 20:23:43,304 >> loading configuration file /mnt/workspace/public_models/gradientai/Llama-3-8B-Instruct-Gradient-1048k/generation_config.json [INFO|configuration_utils.py:928] 2024-05-08 20:23:43,304 >> Generate config GenerationConfig { "bos_token_id": 128000, "do_sample": true, "eos_token_id": [ 128001, 128009 ], "max_length": 4096, "temperature": 0.6, "top_p": 0.9 }
[INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,177 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,177 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,177 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,177 >> loading file tokenizer_config.json [WARNING|logging.py:314] 2024-05-08 20:23:53,456 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,459 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,459 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,459 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2085] 2024-05-08 20:23:53,459 >> loading file tokenizer_config.json [WARNING|logging.py:314] 2024-05-08 20:23:53,706 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:329] 2024-05-08 20:23:53,723 >> /mnt/workspace/public_models/gradientai/Llama-3-8B-Instruct-Gradient-1048k does not have a padding token! Will use pad_token = <|reserved_special_token250|>. 05/08/2024 20:23:54 - INFO - llmtuner.model.utils.checkpointing - Gradient checkpointing enabled. 05/08/2024 20:23:54 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA [WARNING|logging.py:329] 2024-05-08 20:23:54,268 >> Unsloth cannot patch MLP layers with our manual autograd engine since either LoRA adapters are not enabled or a bias term (like in Qwen) is used. [WARNING|logging.py:329] 2024-05-08 20:23:54,268 >> Unsloth cannot patch Attention layers with our manual autograd engine since either LoRA adapters are not enabled or a bias term (like in Qwen) is used. [WARNING|logging.py:329] 2024-05-08 20:23:54,268 >> Unsloth cannot patch O projection layer with our manual autograd engine since either LoRA adapters are not enabled or a bias term (like in Qwen) is used. [WARNING|logging.py:329] 2024-05-08 20:23:54,269 >> Unsloth 2024.4 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers. 05/08/2024 20:23:54 - INFO - llmtuner.model.loader - trainable params: 3407872 || all params: 8033669120 || trainable%: 0.0424 [INFO|trainer.py:626] 2024-05-08 20:23:54,287 >> Using auto half precision backend 05/08/2024 20:23:54 - WARNING - llmtuner.extras.callbacks - Previous trainer log in this folder will be deleted. [WARNING|logging.py:329] 2024-05-08 20:23:54,418 >> ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 2,376 | Num Epochs = 5 O^O/ \/ \ Batch size per device = 1 | Gradient Accumulation steps = 4 \ / Total batch size = 4 | Total steps = 2,970 "-__-" Number of trainable parameters = 3,407,872 0%| | 0/2970 [00:00<?, ?it/sTraceback (most recent call last): File "/mnt/workspace/LLM/LLaMA-Factory/src/train.py", line 14, in
main()
File "/mnt/workspace/LLM/LLaMA-Factory/src/train.py", line 5, in main
run_exp()
File "/mnt/workspace/LLM/LLaMA-Factory/src/llmtuner/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/mnt/workspace/LLM/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/home/pai/envs/llama/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "", line 361, in _fast_inner_training_loop
File "/home/pai/envs/llama/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(inputs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call__
return convert_to_fp32(self.model_forward(*args, *kwargs))
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 789, in convert_to_fp32
return recursively_apply(_convert_to_fp32, tensor, test_type=_is_fp16_bf16_tensor)
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 118, in recursively_apply
{
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 119, in
k: recursively_apply(
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 126, in recursively_apply
return func(data, args, **kwargs)
File "/home/pai/envs/llama/lib/python3.10/site-packages/accelerate/utils/operations.py", line 781, in _convert_to_fp32
return tensor.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 48.93 GiB. GPU
0%|
Expected behavior
在100k时能在80G-A100上正常训练
System Info
transformers
version: 4.40.0Others
No response