Closed evbo closed 9 years ago
@evbo , you are correct, my example was intended as multiple inputs, single output. Typically, there is an inner product layer + softmax on top of the network which takes the hidden state at every time point and predicts something (e.g. the current character). Since this would involve writing another layer and I wanted to keep the code as simple as possible, I implicitly made this extra layer w^T h, where w = [1,0,...] and h is the value of the hidden layer. Would it be less confusing if I added another layer?
With a single input, single ouput, the network simply does not have enough complexity to predict a complex sequence. This would work a lot better if you added an inner product layer on top in order to decode the internal dynamics.
Thanks for confirming. This makes sense. I think stylistically is where people get confused - some come from a background where it is customary to have the output squashing function and some do not. It seems, even according to Lipton's paper, that the extra squashing function is optional with no evidence of it being necessary.
kindly help me whats going over there y_list = [-0.5,0.2,0.1, -0.5] input_val_arr = [np.random.random(xdim) for in y_list]
for _ in ylist what does this ( ) means in above for loop statement
and also please elaborate me i am new in python and nn
You could search the _ syntax for yourself on google (it is just a convention for saying that that variable will not be used). And this was clearly not a Python tutorial. You should follow one first.
Hi Scott. I think there was a much nicer way to answer this question. Even diligent, considerate people will ask some questions whose answers can be found in tutorials.
On Jan 21, 2017 9:37 AM, "Scott Mackay" notifications@github.com wrote:
You could search the _ syntax for yourself on google (it is just a convention for saying that that variable will not be used). And this was clearly not a Python tutorial. You should follow one first.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nicodjimenez/lstm/issues/1#issuecomment-274275931, or mute the thread https://github.com/notifications/unsubscribe-auth/ACR4zmIcw0hgjSrN_eKRNWhvGq7JPoNSks5rUkJggaJpZM4FlxNP .
Sorry. Was not my intention to bring this the wrong way.
@evbo , you are correct, my example was intended as multiple inputs, single output. Typically, there is an inner product layer + softmax on top of the network which takes the hidden state at every time point and predicts something (e.g. the current character). Since this would involve writing another layer and I wanted to keep the code as simple as possible, I implicitly made this extra layer w^T h, where w = [1,0,...] and h is the value of the hidden layer. Would it be less confusing if I added another layer?
With a single input, single ouput, the network simply does not have enough complexity to predict a complex sequence. This would work a lot better if you added an inner product layer on top in order to decode the internal dynamics.
Sincerely hope you can answer my question ASAP,its killing me. From your code, it looks like a multiple-input multiple-output LSTM. You took the first value of state.h in the four Lstm_time node as the final output. Isn't this multiple output? As far as i know, single output means only take the final node'output as result, am i right?
Conceptually, why dont
y_list
andinput_val_arr
have the same dimensions? Is your toy example intended as a multi-input-single-output network?Changing
x_dim = 1
had poor results on convergence (a feeble attempt to makey_list
andinput_val_arr
look more familiar by having equal size). However, replicatingy_list
, such that for eachx
there is an associatedy
(single-input-single-output), had adequate convergence and looked more familiar to common examples I've seen around the web:and then I made very minimal changes to the
ToyLossLayer
:I'm very new to nn and so far have only seen single-input-single-output cases where x and y are equally sized vectors (For example, Karpathy's word prediction rnn: https://gist.github.com/karpathy/d4dee566867f8291f086). I thought to check with you on your intention for the
example_0()
to ensure I was understanding your work fully. Am I close?Thanks! this is the easiest to read code I've found on LSTM.