Closed Abhrant closed 4 weeks ago
Tagging huggingface/peft#1834 for reference.
Did you try out what I mentioned in my comment? This is not something that accelerate can solve automatically for you.
Fized it !
nn.Linear.init(self, in_features, out_features, **kwargs) automatically started a layer in fp32. I passed the dtype arg in kwargs now and it works!
Thanks @BenjaminBossan
I am trying to use a PEFT method "SSF : (https://arxiv.org/pdf/2210.08823)" to finetune large models using FSDP. This my code for creating new layers :
This code runs on a single GPU without any issues. But when I am trying to run the same code on multiple GPUs (L4) using FSDP via accelerate, I keep getting this error : ValueError: Must flatten tensors with uniform dtype but got torch.float16 and torch.float32
I don't understand why this keeps happening. Some help please ?
PS. - I have also tried exclusively converting parameters to .float() and .half() based on the original param dtype, but that didn't help.