Open rabiulcste opened 5 months ago
cc @pacman100 @muellerzr As error appears to be trainer + qlora related
same error
I found a solution, remove torch_dtype
, and it should work fine!
model = Idefics2ForConditionalGeneration.from_pretrained(
args.model_name,
device_map="auto",
low_cpu_mem_usage=True,
quantization_config=bnb_config if USE_QLORA else None,
)
I'm facing the same issue with torch_dtype=torch.float16
I found a solution, remove
torch_dtype
, and it should work fine!
If torch_dtype=torch.float16
is removed, the model weights take double the memory to load. Is there anyway to train with fp16 weights and LoRA?
cc @muellerzr @SunMarc
Another ping @muellerzr @SunMarc
System Info
transformers
version: 4.40.0.dev0Who can help?
@amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
During the training loop, when
accelerator.clip_grad_norm_()
is called, it leads to an unscale operation which fails because the gradients are in FP16. This error suggests a potential issue in handling gradient scaling with mixed precision settings.Expected behavior
This doesn't happen with QLora set to True. I'd expect the model to be fine-tuning without error.