stas00 / ml-engineering

Machine Learning Engineering Open Book
https://stasosphere.com/machine-learning/
Creative Commons Attribution Share Alike 4.0 International
11.55k stars 703 forks source link

grad checkpoint tiny error #64

Closed baochi0212 closed 2 months ago

baochi0212 commented 2 months ago

grad checkpoint tiny error about speed

stas00 commented 2 months ago

This one is a tricky one, it does improve the overall speed because a much bigger BS can be used when this feature is on.

may be we should put an Yes* next to it and write a note under the table:

It slows things down for the given batch size, but since it frees up a lot of memory, enabling a much larger BS, it actually improves the overall speed.

What do you think?

baochi0212 commented 2 months ago

Yeah, i agree. I thought that it must be "No" for fixed batch size. Ok, I changed the pr.

stas00 commented 2 months ago

thinking more about it, I think a No* is probably more correct as you suggested.

stas00 commented 2 months ago

Thank you for this contribution, @baochi0212!