Clip Log-Likelihood and more [WTTE 1.1 release]

Deep learning is hard. Really small silly changes will have big impact. One such that I've discovered is to clip log-likelihood. By clipping it at <log(1-p) it will stop pushing censored observations to the right once the likelihood for the observation is 1-p.

This makes infinity-predictions controllable and fixes 99% of the problem of NaN and numerical instability. I.e, if we set p=1e-4 there will be zero gradient contribution once it found a threshold t s.t Pr(Y>t)=0.999. I previously refrained from clipping since t will not really have a meaning thinking it should/could go to infinity and it should fail. With clipping this wont happen. Interpretations of predictions should be modified to account for this. I concluded benefits outweighs this minor problem.

[x] Version number
[x] Rerun wtte-rnn-examples
[ ] add changelog

Changes

Add clipping to log-likelihood dcebad233bd318f8529463bb51a38dfc77434a21
Deprecate penalization of beta for regularization. I've found that clipping and modulating beta through the activation function parameters is much more effective. c9bfdba609c55af24b55d0da8ce75a9bd964e9cd
Backward-compatible updates of the API of wtte. It's just a little less ugly, i.e call loss_fun = wtte.Loss(type='discrete').loss_function. instead of wtte.loss.... ba130459c52dfe4ba1bbb67c920c51ede73077ab
Added a outputlayer-bias pre-training step to the wtte-rnn-examples. It improves numerical stability and greatly shortens training time, even though it's ugly to have a step like this.

ragulpr / wtte-rnn

Clip Log-Likelihood and more [WTTE 1.1 release] #41

Changes