Question about Gradient Accumulation fix

robinhad commented 4 days ago

Hi,

First of all, thanks for your work on fixing gradient accumulation! I have a question about implementation in unsloth-zoo here. In a blog post https://unsloth.ai/blog/gradient you say that

This means naively averaging over each gradient accumulation step is wrong, but instead we must derive the denominator beforehand.

But checking your code implementation, I can see that you simply add up losses, but denominator is commented https://github.com/unslothai/unsloth-zoo/blob/7b0048e53a6239bdad76cad66bf2490f6a2f8a9b/unsloth_zoo/training_utils.py#L268-L270

shouldn't loss be multiplied by denominator here to match an "After - Unsloth fix" graph?

danielhanchen commented 3 days ago

@robinhad So n_items is where the bulk of the fix is - if you dig into Unsloth's code base, we do the division inside it

robinhad commented 21 hours ago

thanks, @danielhanchen !

just in case for future readers, it's here:

https://github.com/unslothai/unsloth/blob/1f52468fa31bf0b641ec96217ef0f5916a07fce5/unsloth/kernels/cross_entropy_loss.py#L376-L378

unslothai / unsloth-zoo

Question about Gradient Accumulation fix #6