Open sachag678 opened 1 year ago
oh oh
I'm guessing it has something to do with the python versions?
Yes. I have an issue with colab. But I don't have an issue with the local VScode Jupyter notebook.
The local Jupyter notebook version is
Python 3.7.13
The tested colab notebook version is
3.7.14 (default, Sep 8 2022, 00:06:44)
[GCC 7.5.0]
If the diff number is too small, maybe it is fine to use some way to accept it? Colab tested notebook: https://colab.research.google.com/drive/1HmZ8bgtAfvyMaZyu3Sr1Bgxsj35jitTs?usp=sharing
Maybe the issue is Pytorch version?
I used the t.grad.sum() and dt.sum() to compare the sum between colab and the local notebook. colab.txt local.txt
I posted it on Pytorch forum, and I got no answer: https://discuss.pytorch.org/t/numerical-instability-in-google-colab/163610 I am planning to post it on Colab Git Issues.
Yes. I have an issue with colab. But I don't have an issue with the local VScode Jupyter notebook. The local Jupyter notebook version is Python 3.7.13 The tested colab notebook version is 3.7.14 (default, Sep 8 2022, 00:06:44) [GCC 7.5.0]
![]()
If the diff number is too small, maybe it is fine to use some way to accept it? Colab tested notebook: https://colab.research.google.com/drive/1HmZ8bgtAfvyMaZyu3Sr1Bgxsj35jitTs?usp=sharing
Maybe the issue is Pytorch version?
I am getting exactly same maxdiff
for hpreact, and my notebook is running on local machine.
Python 3.9.13
&
torch.version '1.12.1'
I've got a strange observation (using the colab version)
dlogit_maxes = - dnorm_logits.sum(dim=1, keepdim=True)
gives me exact equality
dlogit_maxes = - dnorm_logits.sum(dim=1)
gives approximate equality with a maxdiff ~ 10^-8
In this exapmple - if shapes of the gradients are not equal, but the comparison is made after broadcasting (I guess) - there is a residual difference, otherwise the values equal exactly. Somehow it might have to do with the accuracy limitations of the floating point operations. In this case values are float32 and 10^-8 is close to the precision limit for float32 operations.
I've made a PR for the cmp function to output comparison of shapes, it could probably be useful: https://github.com/karpathy/nn-zero-to-hero/pull/36
Another thing is that maybe what matters is the order of the arithmetic operations. Apparently addition and multiplications of the floats are not associative https://pytorch.org/docs/stable/notes/numerical_accuracy.html
Also the doc says that there results my be inconsistent across devices, and commits in the software.
I had the same difference problem between gradients when running locally, because I used GPU to store tensors and perform computations. Once I changed to CPU, I had the difference in the later computations because of the ordering of operations. I managed to get the exact gradients running on CPU and reordering computations to be the same as in the lecture.
I ran into an interesting issue in makemore 4 backpro ninja where the dhpreact was not exactly matching the hpreact.grad.
However, this was only in the collab notebook because when I put the same code into a local jupyter notebook it works fine.
Not sure why this would be the case but just an odd curiosity.