Closed andysingal closed 9 months ago
Please provide a full reproducer and a reason why this all should fit in the 16GB GPU you have available.
Please provide a full reproducer and a reason why this all should fit in the 16GB GPU you have available.
@sgugger Here is the full code(i hope you got the link shared) While going over lmsys repo i found that they are still doing research on stablevicuna + qlora,... i tried loraconfig, however Loraconfig target_variables do not work. I tried PromptConfig since i was working on Human/bot. Please let me know if you have any further question sor concerns
Hello, could you please reshare the minimal reproducer: code, command you are using to launch the training, the hardware as well as the versions of PyTorch, Transformers, Accelerate and PEFT?
Hello, could you please reshare the minimal reproducer: code, command you are using to launch the training, the hardware as well as the versions of PyTorch, Transformers, Accelerate and PEFT?
Thanks for your response. Here is the colab notebook: https://colab.research.google.com/drive/1By1tOO6HE5Oopj2prr3tkDduewDFNpZu?usp=sharing @pacman100 @sgugger
Hello, could you please reshare the minimal reproducer: code, command you are using to launch the training, the hardware as well as the versions of PyTorch, Transformers, Accelerate and PEFT?
Thanks for your response. Here is the colab notebook: https://colab.research.google.com/drive/1By1tOO6HE5Oopj2prr3tkDduewDFNpZu?usp=sharing @pacman100 @sgugger
Any updates @pacman100 @sgugger
I think the best one for this issue would be @SunMarc as the user is trying to use AutoGPTQ along with PEFT Prompt Tuning.
When trying it on Colab with T4 GPU, I am getting below error which is probably related to the Flash Attention:
RuntimeError Traceback (most recent call last)
[<ipython-input-20-e3a673c6a851>](https://localhost:8080/#) in <cell line: 38>()
36 # print("\n If there's a warning about missing keys above, please disregard :)")
37
---> 38 trainer.train()
39 gc.collect()
40 torch.cuda.empty_cache()
5 frames
[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1554 hf_hub_utils.enable_progress_bars()
1555 else:
-> 1556 return inner_training_loop(
1557 args=args,
1558 resume_from_checkpoint=resume_from_checkpoint,
[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1870
1871 with self.accelerator.accumulate(model):
-> 1872 tr_loss_step = self.training_step(model, inputs)
1873
1874 if (
[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in training_step(self, model, inputs)
2746 scaled_loss.backward()
2747 else:
-> 2748 self.accelerator.backward(loss)
2749
2750 return loss.detach() / self.args.gradient_accumulation_steps
[/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py](https://localhost:8080/#) in backward(self, loss, **kwargs)
1984 self.scaler.scale(loss).backward(**kwargs)
1985 else:
-> 1986 loss.backward(**kwargs)
1987
1988 def set_trigger(self):
[/usr/local/lib/python3.10/dist-packages/torch/_tensor.py](https://localhost:8080/#) in backward(self, gradient, retain_graph, create_graph, inputs)
490 inputs=inputs,
491 )
--> 492 torch.autograd.backward(
493 self, gradient, retain_graph, create_graph, inputs=inputs
494 )
[/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py](https://localhost:8080/#) in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
249 # some Python versions print out the first line of a multi-line function
250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
252 tensors,
253 grad_tensors_,
RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
The notebook that I am trying out is having few changes on top of what the user shared above: https://colab.research.google.com/drive/1UDoYUoSK-YJoFMwEzClhNxbeyBkv5aza?usp=sharing
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Kaggle notebook
Who can help?
@pacman100 @sgu
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
got error:
Expected behavior
would like model to train but vicuna does not support qlora... i am using PromptConfig + Vicuna