Closed proteneer closed 6 years ago
We use the exponential loss function as it is presented in the paper. There is a small trick to avoid overflow: train initially with MSE loss then switch to EXP loss. My code monitors the MSE and when it is under a certain value it switches to EXP loss. This takes less than a full epoch on my big data sets.
Hi guys,
I want to confirm that you guys are still using the exponential loss function. i.e. you're taking the exponential of the sum of squared errors, and not, the sum of the exponential of the squared errors. I'm running into overflow problems where exp^(sum of squared errors) is so that big that it's causing overflow problems.