Closed tinymindxx closed 2 years ago
Glad you find the post useful.
The chain rule of calculus says that in order to calculate the left derivative, you need to take into account all intermediate variables that are affected by changes in s_i which themselves will affect L(t). In this example the only intermediate variables are h_i(t) and h_i(t+1).
Why do you think that left = 0.5*right?
The section called "Higher dimensions" is relevant here: https://en.wikipedia.org/wiki/Chain_rule
@nicodjimenez hi, thanks for your reply, it's really very very kind of you.
I think in the equation bellow,
left = 0.5*right is because :
so we can get:
in my opinion, the derivative is :
once we can get top_diff_h , we can get dsi(t)
and in your code,
if we change
ds = self.state.o * top_diff_h + top_diff_s
to
ds = self.state.o * top_diff_h
it's more likely to get a lower loss in my test.
I don't know where is the wrong point, hope you can get more tips.
thanks for your reply again!
This is the same point as the one made in https://github.com/nicodjimenez/lstm/pull/49. Please move discussion there.
Also, I don't follow your math:
thanks for your great work on this blog, it's the clearest and simplest blog on lstm i have read. in your blog, derivative of s(t) is get by this equation,
i wonder why the left of this equation is equal to the right.
in my opinion, left = 0.5*right. and why we need the top_diff_s if we could calculate
directly.
![image](https://user-images.githubusercontent.com/5286725/36970071-407979de-20a2-11e8-8686-6b07c3fa4e15.png)
I am new to nn and lstm, your blog is the best and clearest blog about implement and derivation of lstm, thanks for your great work.