hycis / bidirectional_RNN

bidirectional lstm
MIT License
153 stars 52 forks source link

Error when running #8

Open daquang opened 8 years ago

daquang commented 8 years ago

I get the following error when I run the IMDB example:

Traceback (most recent call last): File "imdb_birnn.py", line 77, in model.add(BatchNormalization((24 * maxseqlen,))) File "/home/dxquang/anaconda/lib/python2.7/site-packages/keras/layers/containers.py", line 40, in add layer.init_updates() File "/home/dxquang/anaconda/lib/python2.7/site-packages/keras/layers/normalization.py", line 38, in init_updates X = self.get_input(train=True) File "/home/dxquang/anaconda/lib/python2.7/site-packages/keras/layers/core.py", line 43, in get_input return self.previous.get_output(train=train) File "/home/dxquang/anaconda/lib/python2.7/site-packages/keras/layers/core.py", line 296, in get_output X = self.get_input(train) File "/home/dxquang/anaconda/lib/python2.7/site-packages/keras/layers/core.py", line 43, in get_input return self.previous.get_output(train=train) File "/home/dxquang/bidirectional_RNN/birnn.py", line 187, in get_output forward = self.get_forward_output(train) File "/home/dxquang/bidirectional_RNN/birnn.py", line 143, in get_forward_output X = X.dimshuffle((1,0,2)) File "/home/dxquang/anaconda/lib/python2.7/site-packages/theano/tensor/var.py", line 341, in dimshuffle pattern) File "/home/dxquang/anaconda/lib/python2.7/site-packages/theano/tensor/elemwise.py", line 141, in init (i, j, len(input_broadcastable))) ValueError: new_order[2] is 2, but the input only has 2 axes.

hycis commented 8 years ago

i see, the default of return_sequence for bidirectional rnn is set to false, to fixed that, i just added model.add(BiDirectionLSTM(word_vec_len, 50, output_mode='concat'), return_sequences=True) to replace model.add(BiDirectionLSTM(word_vec_len, 50, output_mode='concat')) https://github.com/hycis/bidirectional_RNN/blob/master/imdb_birnn.py#L72

daquang commented 8 years ago

I believe you still have some errors. In your newest version, you have these lines:

model.add(BiDirectionLSTM(word_vec_len, 50, output_mode='concat'), return_sequences=True) model.add(BiDirectionLSTM(100, 24, output_mode='sum'), return_sequences=True)

After changing these two lines as follows, the code works as intended: model.add(BiDirectionLSTM(word_vec_len, 50, output_mode='concat', return_sequences=True)) model.add(BiDirectionLSTM(100, 24, output_mode='sum', return_sequences=True))

hycis commented 8 years ago

yap, you are right, thanks for pointing out.

shwetgarg commented 8 years ago

I am new to bidirectional LSTM, sorry if this is too trivial.

I have some doubt in following lines: --- Stacked up BiDirectionLSTM layers --- model.add(BiDirectionLSTM(word_vec_len, 50, output_mode='concat', return_sequences=True)) model.add(BiDirectionLSTM(100, 24, output_mode='sum', return_sequences=True))

If this is a stacked LSTM, should not output of 1st layer(50) be equal to input of second layer(100). It would be nice if you can help me in understanding that part.

hycis commented 8 years ago

@shwetgarg Because it's a bidirectional, so there is one output from the forward pass and one output from the backward pass, so we can either 'concat' the outputs which give us twice the vector length or simply 'sum' the outputs which return the same length. So for the first LSTM, I use output_mode='concat', that's why it's double