i found out that tanh wasn't implemented in the repo so I copied Karpathy's code from the video to the value class as a method.
using this nonlinearity function should allow you to train better with negative numbers (found this out the hard way by trying to replicate the video dataset with relu instead and my loss was so high).
i found out that tanh wasn't implemented in the repo so I copied Karpathy's code from the video to the value class as a method.
using this nonlinearity function should allow you to train better with negative numbers (found this out the hard way by trying to replicate the video dataset with relu instead and my loss was so high).
hope this helps! -naren