rentruewang / koila

Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code.
https://rentruewang.com/koila/
Apache License 2.0
1.82k stars 62 forks source link

wrong error in getting-started.py #16

Open feifeibear opened 2 years ago

feifeibear commented 2 years ago

Hello, I noticed you fix the lazy label bug and the getting-started.py is able to run. But it can not pass the assertion. The grad diff is quite large!

assert all( [print(torch.max(grad - lazy_grad)) for (grad, lazy_grad) in zip(grads, lazy_grads)] )

tensor(0.0698) tensor(0.0227) tensor(0.0717) tensor(0.0415) tensor(0.5402) tensor(0.7869)

feifeibear commented 2 years ago

BTW: on my machine, the batch as 9 is split as two minibatches [0:8] [8:9]

rentruewang commented 2 years ago

Hmm, that's quite weird. I'll look into it. I believe it may have something to do with mean/sum scaling. Appreciate the bug report!