Closed zhangvia closed 8 months ago
I haven't tried the 8bit optimizer, but I did try mixed precision training and I found it to be less effective compared to full precision training.
besides,you scale the lr after the optimizer was created. how does the scaled lr work?
I don't want him to work...
I haven't tried the 8bit optimizer, but I did try mixed precision training and I found it to be less effective compared to full precision training.
@guoqincode, Did you try FP16 or BF16?
i try the 8bit adam optimizer, i can train stage one on 40g a100. i think it can help reduce the vram usage, but i don't know if it will decrease the model performance. what dou you think? did you try the 8bit adam?