Galore tuning error for IndexError: tuple index out of range

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

When I ran the Galore tuning method as !bash galore_adamw.sh in colab, I found an error by accident, thank u for your great work for fitting Galore so fast, Can u help me to reslove the bug.

The bash scripts are here:

CUDA_VISIBLE_DEVICES=0 python ../../../src/train_bash.py \
    --stage sft \
    --do_train \
    --model_name_or_path Qwen/Qwen1.5-0.5B-Chat \
    --dataset alpaca_gpt4_zh \
    --dataset_dir ../../../data \
    --template qwen \
    --finetuning_type full \
    --use_galore \
    --galore_layerwise \
    --galore_target mlp,self_attn \
    --galore_rank 128 \
    --output_dir content/LLaMA-Factory/saves/Qwen1.5-0.5B-Chat/full1 \
    --overwrite_cache \
    --overwrite_output_dir \
    --cutoff_len 1024 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --warmup_steps 20 \
    --save_steps 100 \
    --eval_steps 100 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --max_samples 3000 \
    --val_size 0.1 \
    --plot_loss \
    --fp16

print output is in the following:

2024-03-10 10:11:05.721386: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-03-10 10:11:05.721451: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-03-10 10:11:05.722779: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-03-10 10:11:07.454865: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 03/10/2024 10:11:11 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16 [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/vocab.json [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/merges.txt [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/tokenizer_config.json [INFO|tokenization_utils_base.py:2046] 2024-03-10 10:11:11,497 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/tokenizer.json [WARNING|logging.py:314] 2024-03-10 10:11:11,745 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 03/10/2024 10:11:11 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_zh.json... Converting format of dataset (num_proc=16): 100% 3000/3000 [00:00<00:00, 4082.23 examples/s] Running tokenizer on dataset (num_proc=16): 100% 3000/3000 [01:15<00:00, 39.64 examples/s] input_ids: [33975, 25, 220, 100662, 108136, 101124, 45139, 8997, 71703, 25, 220, 114566, 100662, 108136, 101124, 45139, 48443, 16, 13, 220, 100662, 101099, 99600, 1773, 101922, 99190, 102618, 106214, 101079, 3837, 29524, 111261, 5373, 107530, 57191, 107140, 3837, 26232, 101902, 114718, 99722, 3837, 101138, 105640, 101102, 90395, 105767, 101940, 107235, 3407, 17, 13, 4891, 251, 229, 99967, 104579, 1773, 101922, 105086, 104838, 9370, 104451, 5373, 104618, 5373, 35987, 100203, 52853, 33108, 105349, 104982, 99285, 9370, 107151, 102153, 3837, 101153, 44636, 100443, 5373, 44636, 105349, 33108, 101130, 101083, 3837, 23031, 100662, 108136, 104579, 100784, 3407, 18, 13, 10236, 251, 94, 101519, 103119, 1773, 105552, 113357, 99722, 107940, 3837, 113459, 101922, 50511, 101907, 220, 22, 12, 23, 58230, 237, 13343, 9370, 105552, 1773, 104205, 105552, 105767, 106104, 101950, 3837, 101902, 101099, 102005, 90395, 100627, 108260, 33108, 118836, 1773, 151645] inputs: Human: 保持健康的三个提示。 Assistant: 以下是保持健康的三个提示：

保持身体活动。每天做适当的身体运动，如散步、跑步或游泳，能促进心血管健康，增强肌肉力量，并有助于减少体重。
均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物，避免高糖、高脂肪和加工食品，以保持健康的饮食习惯。
睡眠充足。睡眠对人体健康至关重要，成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力，促进身体恢复，并提高注意力和记忆力。<|im_end|> label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 114566, 100662, 108136, 101124, 45139, 48443, 16, 13, 220, 100662, 101099, 99600, 1773, 101922, 99190, 102618, 106214, 101079, 3837, 29524, 111261, 5373, 107530, 57191, 107140, 3837, 26232, 101902, 114718, 99722, 3837, 101138, 105640, 101102, 90395, 105767, 101940, 107235, 3407, 17, 13, 4891, 251, 229, 99967, 104579, 1773, 101922, 105086, 104838, 9370, 104451, 5373, 104618, 5373, 35987, 100203, 52853, 33108, 105349, 104982, 99285, 9370, 107151, 102153, 3837, 101153, 44636, 100443, 5373, 44636, 105349, 33108, 101130, 101083, 3837, 23031, 100662, 108136, 104579, 100784, 3407, 18, 13, 10236, 251, 94, 101519, 103119, 1773, 105552, 113357, 99722, 107940, 3837, 113459, 101922, 50511, 101907, 220, 22, 12, 23, 58230, 237, 13343, 9370, 105552, 1773, 104205, 105552, 105767, 106104, 101950, 3837, 101902, 101099, 102005, 90395, 100627, 108260, 33108, 118836, 1773, 151645] labels: 以下是保持健康的三个提示：
保持身体活动。每天做适当的身体运动，如散步、跑步或游泳，能促进心血管健康，增强肌肉力量，并有助于减少体重。
均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物，避免高糖、高脂肪和加工食品，以保持健康的饮食习惯。
睡眠充足。睡眠对人体健康至关重要，成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力，促进身体恢复，并提高注意力和记忆力。<|im_end|> [INFO|configuration_utils.py:728] 2024-03-10 10:12:34,342 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/config.json [INFO|configuration_utils.py:791] 2024-03-10 10:12:34,344 >> Model config Qwen2Config { "_name_or_path": "Qwen/Qwen1.5-0.5B-Chat", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 2816, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 16, "num_hidden_layers": 24, "num_key_value_heads": 16, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.38.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 }

[INFO|modeling_utils.py:3257] 2024-03-10 10:12:34,371 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/model.safetensors [INFO|modeling_utils.py:1400] 2024-03-10 10:12:34,382 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:845] 2024-03-10 10:12:34,384 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 }

[INFO|modeling_utils.py:3992] 2024-03-10 10:12:37,382 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4000] 2024-03-10 10:12:37,382 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen1.5-0.5B-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. [INFO|configuration_utils.py:800] 2024-03-10 10:12:37,654 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6c705984bb8b5591dd4e1a9e66e1a127965fd08d/generation_config.json [INFO|configuration_utils.py:845] 2024-03-10 10:12:37,655 >> Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.1, "top_p": 0.8 }

03/10/2024 10:12:37 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled. 03/10/2024 10:12:37 - INFO - llmtuner.model.adapter - Fine-tuning method: Full 03/10/2024 10:12:37 - INFO - llmtuner.model.loader - trainable params: 463987712 || all params: 463987712 || trainable%: 100.0000 /usr/local/lib/python3.10/dist-packages/galore_torch/adamw.py:48: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( 03/10/2024 10:12:38 - INFO - llmtuner.train.utils - Using GaLore optimizer, may cause hanging at the start of training, wait patiently. [INFO|trainer.py:601] 2024-03-10 10:12:38,151 >> Using auto half precision backend [INFO|trainer.py:1812] 2024-03-10 10:12:38,506 >> Running training [INFO|trainer.py:1813] 2024-03-10 10:12:38,506 >> Num examples = 2,700 [INFO|trainer.py:1814] 2024-03-10 10:12:38,506 >> Num Epochs = 3 [INFO|trainer.py:1815] 2024-03-10 10:12:38,506 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1818] 2024-03-10 10:12:38,506 >> Total train batch size (w. parallel, distributed & accumulation) = 1 [INFO|trainer.py:1819] 2024-03-10 10:12:38,506 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1820] 2024-03-10 10:12:38,506 >> Total optimization steps = 8,100 [INFO|trainer.py:1821] 2024-03-10 10:12:38,507 >> Number of trainable parameters = 463,987,712 0% 0/8100 [00:00<?, ?it/s]Traceback (most recent call last): File "/content/LLaMA-Factory/examples/extras/galore/../../../src/train_bash.py", line 14, in main() File "/content/LLaMA-Factory/examples/extras/galore/../../../src/train_bash.py", line 5, in main run_exp() File "/content/LLaMA-Factory/src/llmtuner/train/tuner.py", line 32, in run_exp run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/content/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 73, in run_sft train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1624, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1961, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2911, in training_step self.accelerator.backward(loss) File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1964, in backward self.scaler.scale(loss).backward(kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply return user_fn(self, args) File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 288, in backward torch.autograd.backward(outputs_with_grad, args_with_grad) File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/content/LLaMA-Factory/src/llmtuner/train/utils.py", line 228, in optimizer_hook optimizer_dict[param].step() File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 68, in wrapper return wrapped(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 373, in wrapper out = func(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/galore_torch/adamw.py", line 96, in step grad = state["projector"].project(grad, state["step"]) File "/usr/local/lib/python3.10/dist-packages/galore_torch/galore_projector.py", line 15, in project if full_rank_grad.shape[0] >= full_rank_grad.shape[1]: IndexError: tuple index out of range 0% 0/8100 [00:02<?, ?it/s]

Expected behavior

No response

System Info

transformers version: 4.38.2
Platform: Linux-6.1.58+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu121 (True)
Tensorflow version (GPU?): 2.15.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.8.1 (cpu)
Jax version: 0.4.23
JaxLib version: 0.4.23
Using GPU in script?:
Using distributed or parallel set-up in script?:

Others

No response

hiyouga / LLaMA-Factory