[X] I have read the README and searched the existing issues.
Reproduction
When I ran the Galore tuning method as !bash galore_adamw.sh in colab, I found an error by accident, thank u for your great work for fitting Galore so fast, Can u help me to reslove the bug.
[INFO|modeling_utils.py:3257] 2024-03-10 10:12:34,371 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/model.safetensors
[INFO|modeling_utils.py:1400] 2024-03-10 10:12:34,382 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:845] 2024-03-10 10:12:34,384 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645
}
[INFO|modeling_utils.py:3992] 2024-03-10 10:12:37,382 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.
[INFO|modeling_utils.py:4000] 2024-03-10 10:12:37,382 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen1.5-0.5B-Chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:800] 2024-03-10 10:12:37,654 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/generation_config.json
[INFO|configuration_utils.py:845] 2024-03-10 10:12:37,655 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.1,
"top_p": 0.8
}
03/10/2024 10:12:37 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
03/10/2024 10:12:37 - INFO - llmtuner.model.adapter - Fine-tuning method: Full
03/10/2024 10:12:37 - INFO - llmtuner.model.loader - trainable params: 463987712 || all params: 463987712 || trainable%: 100.0000
/usr/local/lib/python3.10/dist-packages/galore_torch/adamw.py:48: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
03/10/2024 10:12:38 - INFO - llmtuner.train.utils - Using GaLore optimizer, may cause hanging at the start of training, wait patiently.
[INFO|trainer.py:601] 2024-03-10 10:12:38,151 >> Using auto half precision backend
[INFO|trainer.py:1812] 2024-03-10 10:12:38,506 >> Running training
[INFO|trainer.py:1813] 2024-03-10 10:12:38,506 >> Num examples = 2,700
[INFO|trainer.py:1814] 2024-03-10 10:12:38,506 >> Num Epochs = 3
[INFO|trainer.py:1815] 2024-03-10 10:12:38,506 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1818] 2024-03-10 10:12:38,506 >> Total train batch size (w. parallel, distributed & accumulation) = 1
[INFO|trainer.py:1819] 2024-03-10 10:12:38,506 >> Gradient Accumulation steps = 1
[INFO|trainer.py:1820] 2024-03-10 10:12:38,506 >> Total optimization steps = 8,100
[INFO|trainer.py:1821] 2024-03-10 10:12:38,507 >> Number of trainable parameters = 463,987,712
0% 0/8100 [00:00<?, ?it/s]Traceback (most recent call last):
File "/content/LLaMA-Factory/examples/extras/galore/../../../src/train_bash.py", line 14, in
main()
File "/content/LLaMA-Factory/examples/extras/galore/../../../src/train_bash.py", line 5, in main
run_exp()
File "/content/LLaMA-Factory/src/llmtuner/train/tuner.py", line 32, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/content/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1624, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1961, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2911, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1964, in backward
self.scaler.scale(loss).backward(kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply
return user_fn(self, args)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 288, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/content/LLaMA-Factory/src/llmtuner/train/utils.py", line 228, in optimizer_hook
optimizer_dict[param].step()
File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/galore_torch/adamw.py", line 96, in step
grad = state["projector"].project(grad, state["step"])
File "/usr/local/lib/python3.10/dist-packages/galore_torch/galore_projector.py", line 15, in project
if full_rank_grad.shape[0] >= full_rank_grad.shape[1]:
IndexError: tuple index out of range
0% 0/8100 [00:02<?, ?it/s]
Reminder
Reproduction
When I ran the Galore tuning method as
!bash galore_adamw.sh
in colab, I found an error by accident, thank u for your great work for fitting Galore so fast, Can u help me to reslove the bug.The bash scripts are here:
print output is in the following:
2024-03-10 10:11:05.721386: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-03-10 10:11:05.721451: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-03-10 10:11:05.722779: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-03-10 10:11:07.454865: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 03/10/2024 10:11:11 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16 [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/vocab.json [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/merges.txt [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/tokenizer_config.json [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/tokenizer.json [WARNING|logging.py:314] 2024-03-10 10:11:11,745 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 03/10/2024 10:11:11 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_zh.json... Converting format of dataset (num_proc=16): 100% 3000/3000 [00:00<00:00, 4082.23 examples/s] Running tokenizer on dataset (num_proc=16): 100% 3000/3000 [01:15<00:00, 39.64 examples/s] input_ids: [33975, 25, 220, 100662, 108136, 101124, 45139, 8997, 71703, 25, 220, 114566, 100662, 108136, 101124, 45139, 48443, 16, 13, 220, 100662, 101099, 99600, 1773, 101922, 99190, 102618, 106214, 101079, 3837, 29524, 111261, 5373, 107530, 57191, 107140, 3837, 26232, 101902, 114718, 99722, 3837, 101138, 105640, 101102, 90395, 105767, 101940, 107235, 3407, 17, 13, 4891, 251, 229, 99967, 104579, 1773, 101922, 105086, 104838, 9370, 104451, 5373, 104618, 5373, 35987, 100203, 52853, 33108, 105349, 104982, 99285, 9370, 107151, 102153, 3837, 101153, 44636, 100443, 5373, 44636, 105349, 33108, 101130, 101083, 3837, 23031, 100662, 108136, 104579, 100784, 3407, 18, 13, 10236, 251, 94, 101519, 103119, 1773, 105552, 113357, 99722, 107940, 3837, 113459, 101922, 50511, 101907, 220, 22, 12, 23, 58230, 237, 13343, 9370, 105552, 1773, 104205, 105552, 105767, 106104, 101950, 3837, 101902, 101099, 102005, 90395, 100627, 108260, 33108, 118836, 1773, 151645] inputs: Human: 保持健康的三个提示。 Assistant: 以下是保持健康的三个提示:
保持身体活动。每天做适当的身体运动,如散步、跑步或游泳,能促进心血管健康,增强肌肉力量,并有助于减少体重。
均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物,避免高糖、高脂肪和加工食品,以保持健康的饮食习惯。
睡眠充足。睡眠对人体健康至关重要,成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力,促进身体恢复,并提高注意力和记忆力。<|im_end|> label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 114566, 100662, 108136, 101124, 45139, 48443, 16, 13, 220, 100662, 101099, 99600, 1773, 101922, 99190, 102618, 106214, 101079, 3837, 29524, 111261, 5373, 107530, 57191, 107140, 3837, 26232, 101902, 114718, 99722, 3837, 101138, 105640, 101102, 90395, 105767, 101940, 107235, 3407, 17, 13, 4891, 251, 229, 99967, 104579, 1773, 101922, 105086, 104838, 9370, 104451, 5373, 104618, 5373, 35987, 100203, 52853, 33108, 105349, 104982, 99285, 9370, 107151, 102153, 3837, 101153, 44636, 100443, 5373, 44636, 105349, 33108, 101130, 101083, 3837, 23031, 100662, 108136, 104579, 100784, 3407, 18, 13, 10236, 251, 94, 101519, 103119, 1773, 105552, 113357, 99722, 107940, 3837, 113459, 101922, 50511, 101907, 220, 22, 12, 23, 58230, 237, 13343, 9370, 105552, 1773, 104205, 105552, 105767, 106104, 101950, 3837, 101902, 101099, 102005, 90395, 100627, 108260, 33108, 118836, 1773, 151645] labels: 以下是保持健康的三个提示:
保持身体活动。每天做适当的身体运动,如散步、跑步或游泳,能促进心血管健康,增强肌肉力量,并有助于减少体重。
均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物,避免高糖、高脂肪和加工食品,以保持健康的饮食习惯。
睡眠充足。睡眠对人体健康至关重要,成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力,促进身体恢复,并提高注意力和记忆力。<|im_end|> [INFO|configuration_utils.py:728] 2024-03-10 10:12:34,342 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/config.json [INFO|configuration_utils.py:791] 2024-03-10 10:12:34,344 >> Model config Qwen2Config { "_name_or_path": "Qwen/Qwen1.5-0.5B-Chat", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 2816, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 16, "num_hidden_layers": 24, "num_key_value_heads": 16, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.38.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 }
[INFO|modeling_utils.py:3257] 2024-03-10 10:12:34,371 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/model.safetensors [INFO|modeling_utils.py:1400] 2024-03-10 10:12:34,382 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:845] 2024-03-10 10:12:34,384 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 }
[INFO|modeling_utils.py:3992] 2024-03-10 10:12:37,382 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.
[INFO|modeling_utils.py:4000] 2024-03-10 10:12:37,382 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen1.5-0.5B-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. [INFO|configuration_utils.py:800] 2024-03-10 10:12:37,654 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/generation_config.json [INFO|configuration_utils.py:845] 2024-03-10 10:12:37,655 >> Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.1, "top_p": 0.8 }
03/10/2024 10:12:37 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled. 03/10/2024 10:12:37 - INFO - llmtuner.model.adapter - Fine-tuning method: Full 03/10/2024 10:12:37 - INFO - llmtuner.model.loader - trainable params: 463987712 || all params: 463987712 || trainable%: 100.0000 /usr/local/lib/python3.10/dist-packages/galore_torch/adamw.py:48: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set
main()
File "/content/LLaMA-Factory/examples/extras/galore/../../../src/train_bash.py", line 5, in main
run_exp()
File "/content/LLaMA-Factory/src/llmtuner/train/tuner.py", line 32, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/content/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1624, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1961, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2911, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1964, in backward
self.scaler.scale(loss).backward(kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply
return user_fn(self, args)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 288, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/content/LLaMA-Factory/src/llmtuner/train/utils.py", line 228, in optimizer_hook
optimizer_dict[param].step()
File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/galore_torch/adamw.py", line 96, in step
grad = state["projector"].project(grad, state["step"])
File "/usr/local/lib/python3.10/dist-packages/galore_torch/galore_projector.py", line 15, in project
if full_rank_grad.shape[0] >= full_rank_grad.shape[1]:
IndexError: tuple index out of range
0% 0/8100 [00:02<?, ?it/s]
no_deprecation_warning=True
to disable this warning warnings.warn( 03/10/2024 10:12:38 - INFO - llmtuner.train.utils - Using GaLore optimizer, may cause hanging at the start of training, wait patiently. [INFO|trainer.py:601] 2024-03-10 10:12:38,151 >> Using auto half precision backend [INFO|trainer.py:1812] 2024-03-10 10:12:38,506 >> Running training [INFO|trainer.py:1813] 2024-03-10 10:12:38,506 >> Num examples = 2,700 [INFO|trainer.py:1814] 2024-03-10 10:12:38,506 >> Num Epochs = 3 [INFO|trainer.py:1815] 2024-03-10 10:12:38,506 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1818] 2024-03-10 10:12:38,506 >> Total train batch size (w. parallel, distributed & accumulation) = 1 [INFO|trainer.py:1819] 2024-03-10 10:12:38,506 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1820] 2024-03-10 10:12:38,506 >> Total optimization steps = 8,100 [INFO|trainer.py:1821] 2024-03-10 10:12:38,507 >> Number of trainable parameters = 463,987,712 0% 0/8100 [00:00<?, ?it/s]Traceback (most recent call last): File "/content/LLaMA-Factory/examples/extras/galore/../../../src/train_bash.py", line 14, inExpected behavior
No response
System Info
transformers
version: 4.38.2Others
No response