nicodjimenez / lstm

Minimal, clean example of lstm neural network training in python, for learning purposes.
1.72k stars 654 forks source link

why there is no 0.5* in the equation of the derivative of sate(t) ? #33

Closed tinymindxx closed 2 years ago

tinymindxx commented 6 years ago

thanks for your great work on this blog, it's the clearest and simplest blog on lstm i have read. in your blog, derivative of s(t) is get by this equation, image i wonder why the left of this equation is equal to the right. in my opinion, left = 0.5*right. and why we need the top_diff_s if we could calculate image directly. image

I am new to nn and lstm, your blog is the best and clearest blog about implement and derivation of lstm, thanks for your great work.

nicodjimenez commented 6 years ago

Glad you find the post useful.

The chain rule of calculus says that in order to calculate the left derivative, you need to take into account all intermediate variables that are affected by changes in s_i which themselves will affect L(t). In this example the only intermediate variables are h_i(t) and h_i(t+1).

Why do you think that left = 0.5*right?

nicodjimenez commented 6 years ago

The section called "Higher dimensions" is relevant here: https://en.wikipedia.org/wiki/Chain_rule

tinymindxx commented 6 years ago

@nicodjimenez hi, thanks for your reply, it's really very very kind of you.

I think in the equation bellow, image left = 0.5*right is because :

image so we can get: image

in my opinion, the derivative is : image

once we can get top_diff_h , we can get dsi(t)

and in your code, image if we change

ds = self.state.o * top_diff_h + top_diff_s

to

ds = self.state.o * top_diff_h 

it's more likely to get a lower loss in my test.

I don't know where is the wrong point, hope you can get more tips.

thanks for your reply again!

nicodjimenez commented 2 years ago

This is the same point as the one made in https://github.com/nicodjimenez/lstm/pull/49. Please move discussion there.

Also, I don't follow your math:

image