Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values

Hi GaLore Team, congratulations for the interesting work!

I am trying to fine-tune llama-3 8B model using GaLore but getting this error: torch._C._LinAlgError: linalg.svd: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values.

Interestingly first batch loss is non-zero and all subsequent losses are zero values before training is automatically terminated.

Full Error Log

Activated GaLoRE fine-tuning, depending on your model size and hardware, the training might take a while before starting. Please be patient !
model.layers.0.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.0.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.0.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.0.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.16.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.16.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.16.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.16.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.17.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.17.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.17.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.17.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.18.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.18.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.18.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.18.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.19.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.19.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.19.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.19.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.20.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.20.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.20.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.20.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.21.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.21.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.21.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.21.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.22.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.22.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.22.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.22.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.23.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.23.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.23.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.23.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.24.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.24.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.24.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.24.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.25.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.25.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.25.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.25.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.26.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.26.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.26.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.26.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.27.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.27.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.27.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.27.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.28.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.28.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.28.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.28.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.29.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.29.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.29.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.29.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.30.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.30.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.30.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.30.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.31.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.31.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.31.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.31.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
0%|                                                                                                                                                                            | 0/6719320 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
0%|                                                                                                                                                            | 1/6719320 [06:15<701609:55:03, 375.90s/it][2024-07-23 07:19:54,094] [INFO] [axolotl.callbacks.on_step_end:128] [PID:148509] [RANK:0] GPU memory usage while training: 17.607GB (+15.215GB cache, +1.482GB misc)
{'loss': 1.694, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                      
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
0%|                                                                                                                                                             | 200/6719320 [08:51<1455:09:36,  1.28it/s]/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/galore_projector.py:83: UserWarning: torch.linalg.svd: During SVD computation with the selected cusolver driver, batches 0 failed to converge. A more accurate method will be used to compute the SVD as a fallback. Check doc at https://pytorch.org/docs/stable/generated/torch.linalg.svd.html (Triggered internally at ../aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp:697.)
U, s, Vh = torch.linalg.svd(matrix, full_matrices = False)
Traceback (most recent call last):
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  return _run_code(code, main_globals, None,
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/runpy.py", line 86, in _run_code
  exec(code, run_globals)
File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/cli/train.py", line 72, in <module>
  fire.Fire(do_cli)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
  component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
  component, remaining_args = _CallAndUpdateTrace(
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
  component = fn(*varargs, **kwargs)
File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/cli/train.py", line 39, in do_cli
  return do_train(parsed_cfg, parsed_cli_args)
File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/cli/train.py", line 67, in do_train
  return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/train.py", line 191, in train
  trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train
  return inner_training_loop(
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
  tr_loss_step = self.training_step(model, inputs)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 3324, in training_step
  self.accelerator.backward(loss, **kwargs)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/accelerator.py", line 2151, in backward
  loss.backward(**kwargs)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
  torch.autograd.backward(
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
  _engine_run_backward(
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 1398, in optimizer_hook
  optimizer_dict[param].step()
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 75, in wrapper
  return wrapped(*args, **kwargs)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/optim/optimizer.py", line 391, in wrapper
  out = func(*args, **kwargs)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  return func(*args, **kwargs)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/adamw8bit.py", line 58, in step
  grad = state["projector"].project(p.grad, state["step"])
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/galore_projector.py", line 21, in project
  self.ortho_matrix = self.get_orthogonal_matrix(full_rank_grad, self.rank, type='left')
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/galore_projector.py", line 83, in get_orthogonal_matrix
  U, s, Vh = torch.linalg.svd(matrix, full_matrices = False)
torch._C._LinAlgError: linalg.svd: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 1023).
0%|                                                                                                                                                             | 200/6719320 [13:19<7465:36:33,  4.00s/it]
Traceback (most recent call last):
File "/home/minimalist/miniconda3/envs/comps/bin/accelerate", line 8, in <module>
  sys.exit(main())
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
  args.func(args)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
  simple_launcher(args)
File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/commands/launch.py", line 703, in simple_launcher
  raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/minimalist/miniconda3/envs/comps/bin/python', '-m', 'axolotl.cli.train', 'examples/llama-3/qlora.yml']' returned non-zero exit status 1.

Hyperparams


base_model: meta-llama/Meta-Llama-3-8B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

datasets:

path: type: sharegpt conversation: llama-3 field_human: human field_model: gpt

dataset_prepared_path: val_set_size: 0.01 output_dir: ./outputs/galore-out

sequence_len: 2048 sample_packing: false eval_sample_packing: true pad_to_sequence_len: true

gradient_accumulation_steps: 1 micro_batch_size: 1 num_epochs: 4 optimizer: galore_adamw_8bit_layerwise lr_scheduler: cosine learning_rate: 0.000001

optim_target_modules:

self_attn
mlp

train_on_inputs: false group_by_length: false bf16: true tf32: false

bfloat16: true

logging_steps: 4 flash_attention: true

jiaweizzhao / GaLore

Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values #58

Full Error Log

Hyperparams