Closed tristandeleu closed 8 years ago
As the NTM is trained on the cross entropy cost function, its gradient is not defined if the prediction is exactly 0 or 1.
In [17]: output Out[17]: array([[[ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 1., 1., 1., 1., 1., 1., 0., 1., 0.]]]) In [18]: ntm_fun(input) Out[18]: array([[[ 1.32529286e-35, 1.26697327e-23, 1.05187953e-46, 7.28068425e-57, 3.54038684e-42, 3.13652896e-48, 1.06308336e-22, 2.13761781e-93, 1.46246878e-45], [ 6.83159575e-07, 2.58929423e-03, 2.93741369e-04, 1.88398819e-08, 1.14350232e-04, 8.40953354e-12, 6.11956356e-06, 5.67079701e-12, 1.30867115e-14], [ 9.31182794e-01, 7.83335632e-01, 9.78310319e-01, 9.99838718e-01, 9.92572563e-01, 9.95089144e-01, 1.62297365e-01, 1.00000000e+00, 2.65213791e-08]]]) In [19]: new_params Out[19]: ... b_dense: array([ 1.62206065, 0.91350565, 0.07703372, 1.46771803, 1.06758648, 0.60542304, 1.51142395, nan, -4.70645698])]
One solution could be to scale the predictions in a [epsilon, 1 - epsilon] range.
[epsilon, 1 - epsilon]
Fixed in c0f85e14e8546f310db468cf27a6eb845821d90e
As the NTM is trained on the cross entropy cost function, its gradient is not defined if the prediction is exactly 0 or 1.
One solution could be to scale the predictions in a
[epsilon, 1 - epsilon]
range.