Question regarding quantized / mixed precision training with lightning

gregor-ge / mBLIP

MIT License

85 stars 7 forks source link

Question regarding quantized / mixed precision training with lightning #10

Closed floschne closed 10 months ago

floschne commented 10 months ago

Hi! Thanks for publishing this awesome work, it's very inspirational for me :)

I am just trying to understand your codebase and have a question regarding the quantized / mixed precision training with lightning:

In line https://github.com/gregor-ge/mBLIP/blob/main/src/modules/modeling/mblip.py#L485: Why do you have to use autocast here? Isn't it automatically applied by lightning?

gregor-ge commented 10 months ago

Hi!

Happy to hear you like this project.

Some of the LLMs that I used (like the T5 models) have to use bf16 to work correctly and the LAVIS implementation used something like this to give correct results even if you otherwise train with fp16 (https://github.com/salesforce/LAVIS/blob/main/lavis/models/blip2_models/blip2_t5.py). You could probably remove it and it works for most models or use bf16 in the lightning config for precision.

floschne commented 10 months ago

Ah okay! Got it, thanks for the explanation. :)