Closed tonyzhang617 closed 2 years ago
Are you sure about this? I haven't looked at this in a while. Doesn't top_diff_s
need to be propagated recursively along the constant error carousel?
Forward equation:
Derivatives:
I don't see where the math is wrong.
More details from the blog post:
The derivative
ds
was calculated incorrectly. I removed the wrong variabletop_diff_s
in this PR. After removing the variable and testing the example, the loss is significantly lower than before after training.Output Before Change:
iter 99: y_pred = [-0.50033, 0.20106, 0.09912, -0.49923], loss: 2.611e-06
Output After Change:iter 99: y_pred = [-0.49995, 0.19999, 0.10001, -0.50006], loss: 6.078e-09