nicodjimenez / lstm

Minimal, clean example of lstm neural network training in python, for learning purposes.
1.75k stars 651 forks source link

y_list dimension #1

Closed evbo closed 9 years ago

evbo commented 9 years ago

Conceptually, why dont y_list and input_val_arr have the same dimensions? Is your toy example intended as a multi-input-single-output network?

Changing x_dim = 1 had poor results on convergence (a feeble attempt to make y_list and input_val_arr look more familiar by having equal size). However, replicating y_list, such that for each x there is an associated y (single-input-single-output), had adequate convergence and looked more familiar to common examples I've seen around the web:

yy = [-0.5,0.2,0.1, -0.5]
y_list = [[y for _ in xrange(x_dim)] for y in yy]
input_val_arr = [np.random.random(x_dim) for _ in yy]

and then I made very minimal changes to the ToyLossLayer:

@classmethod
def loss(self, pred, label):
    return np.sum((pred[:len(label)] - label) ** 2)

@classmethod
def bottom_diff(self, pred, label):
    diff = np.zeros_like(pred)
    diff[:len(label)] = 2 * (pred[:len(label)] - label)
    return diff

I'm very new to nn and so far have only seen single-input-single-output cases where x and y are equally sized vectors (For example, Karpathy's word prediction rnn: https://gist.github.com/karpathy/d4dee566867f8291f086). I thought to check with you on your intention for the example_0() to ensure I was understanding your work fully. Am I close?

Thanks! this is the easiest to read code I've found on LSTM.

nicodjimenez commented 9 years ago

@evbo , you are correct, my example was intended as multiple inputs, single output. Typically, there is an inner product layer + softmax on top of the network which takes the hidden state at every time point and predicts something (e.g. the current character). Since this would involve writing another layer and I wanted to keep the code as simple as possible, I implicitly made this extra layer w^T h, where w = [1,0,...] and h is the value of the hidden layer. Would it be less confusing if I added another layer?

With a single input, single ouput, the network simply does not have enough complexity to predict a complex sequence. This would work a lot better if you added an inner product layer on top in order to decode the internal dynamics.

evbo commented 9 years ago

Thanks for confirming. This makes sense. I think stylistically is where people get confused - some come from a background where it is customary to have the output squashing function and some do not. It seems, even according to Lipton's paper, that the extra squashing function is optional with no evidence of it being necessary.

mishfaq commented 8 years ago

kindly help me whats going over there y_list = [-0.5,0.2,0.1, -0.5] input_val_arr = [np.random.random(xdim) for in y_list]

for _ in ylist what does this ( ) means in above for loop statement

and also please elaborate me i am new in python and nn

ScottMackay2 commented 7 years ago

You could search the _ syntax for yourself on google (it is just a convention for saying that that variable will not be used). And this was clearly not a Python tutorial. You should follow one first.

zackchase commented 7 years ago

Hi Scott. I think there was a much nicer way to answer this question. Even diligent, considerate people will ask some questions whose answers can be found in tutorials.

On Jan 21, 2017 9:37 AM, "Scott Mackay" notifications@github.com wrote:

You could search the _ syntax for yourself on google (it is just a convention for saying that that variable will not be used). And this was clearly not a Python tutorial. You should follow one first.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nicodjimenez/lstm/issues/1#issuecomment-274275931, or mute the thread https://github.com/notifications/unsubscribe-auth/ACR4zmIcw0hgjSrN_eKRNWhvGq7JPoNSks5rUkJggaJpZM4FlxNP .

ScottMackay2 commented 7 years ago

Sorry. Was not my intention to bring this the wrong way.

MrLeexm commented 3 years ago

@evbo , you are correct, my example was intended as multiple inputs, single output. Typically, there is an inner product layer + softmax on top of the network which takes the hidden state at every time point and predicts something (e.g. the current character). Since this would involve writing another layer and I wanted to keep the code as simple as possible, I implicitly made this extra layer w^T h, where w = [1,0,...] and h is the value of the hidden layer. Would it be less confusing if I added another layer?

With a single input, single ouput, the network simply does not have enough complexity to predict a complex sequence. This would work a lot better if you added an inner product layer on top in order to decode the internal dynamics.

Sincerely hope you can answer my question ASAP,its killing me. From your code, it looks like a multiple-input multiple-output LSTM. You took the first value of state.h in the four Lstm_time node as the final output. Isn't this multiple output? As far as i know, single output means only take the final node'output as result, am i right?