Closed ddaspit closed 10 months ago
I'm getting an error when I try to do this: ValueError: Attempting to unscale FP16 gradients.
. From what I've been reading, you can't train a model loaded with torch_dtype=torch.float16
because the optimization step still requires float32
s, and just using mixed precision is supposed to take care of everything, so I'm not sure this is something we can do.
When are you loading the model using torch_dtype=torch.float16
? Before training or inferencing? Try using it only before inferencing.
We fine tune HF models using mixed precision (
fp16
). In spite of this, I believe that models like NLLB are still loaded usingtorch.float32
weights. We should try forcing the model to load usingtorch.float16
to see if it reduces memory usage and increases inferencing speed. Thedtype
can be specified when the model is loaded.