Closed jayeew closed 5 years ago
This is possibly due to using shape
instead of batch_shape
as this layer still cannot work with unknown batch sizes. Try setting a batch size in your layers and see if that fixes the issue. I will be working on how to get it to run with unknown batch sizes, but haven't had time to look into it yet.
The commit ae4b7ce3be9767ad9fb86b1a9c0e86691eb8efbc should fix this issue. Please see if it works now.
Thank you. Actually I have already implement the attention mechanism according to some figures in your blog.
Thank you. Actually I have already implement the attention mechanism according to some figures in your blog.
attention = dot([decoder_lstm, encoder_lstm], axes=[2, 2]) attention = Activation('softmax')(attention) context = dot([attention, encoderlstm], axes=[2,1]) decoder_combined_context = concatenate([context, decoder_lstm])
output = dense1(decoder_combined_context) output = dense2(Dropout(0.5)(output))
model = Model([input_question, input_answer], output)
Sounds good :)
First thank you for your implement of attention. when I built a lstm seq2seq chatbot use your implement, I got an error in line
attn_out, attn_states = attn_layer([encoder_out, decoder_lstm])
which throw me an error likeTypeError: __int__ returned non-int (type NoneType)
And my core code here:embed_layer = Embedding(input_dim=vocab_size, output_dim=50, trainable=True)
embed_layer.build((None,)) embed_layer.set_weights([embedding_matrix])
LSTM_cell = Bidirectional(LSTM(128, return_sequences=True, return_state=True)) LSTM_decoder = LSTM(256, return_sequences=True, return_state=True)
dense = TimeDistributed(Dense(vocab_size, activation='softmax'))
input_context = Input(shape=(maxLen, ), dtype='int32', name='input_context') #maxLen=20
input_target = Input(shape=(maxLen, ), dtype='int32', name='input_target')
input_context_embed = embed_layer(input_context) input_target_embed = embed_layer(input_target)
encoder_out, forward_h, forward_c, backward_h, backward_c = LSTM_cell(input_context_embed) context_h = Concatenate()([forward_h, backward_h]) context_c = Concatenate()([forward_c, backward_c])
decoder_lstm, _, _ = LSTM_decoder(input_target_embed, initial_state=[context_h, context_c])
print('decoder_lstm.shape: ', decoder_lstm.shape) #(?, ?, 256) print('encoder_out.shape: ', encoder_out.shape) #(?, ?, 256)
# ***********************Start Code Here**********************
''' Attention layer ***** A '''
attn_layer = AttentionLayer(name='attention_layer') attn_out, attn_states = attn_layer([encoder_out, decoder_lstm]) merge = Concatenate(axis=-1, name='concat_layer' )([decoder_lstm, attn_out])
# ***********************End Code Here**********************
output = dense(merge) model.summary() model = Model([input_context, input_target, s0, c0], output)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit([context_, final_target_], outs, epochs=2, batch_size=128, validation_split=0.2)
And the error detail below: `--------------------------------------------------------------------------- ValueError Traceback (most recent call last) c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\array_ops.py in zeros(shape, dtype, name) 1810 shape = constant_op._tensor_shape_tensor_conversion_function( -> 1811 tensor_shape.TensorShape(shape)) 1812 except (TypeError, ValueError):
c:\users\rnn_n\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\constant_op.py in _tensor_shape_tensor_conversion_function(s, dtype, name, as_ref) 324 raise ValueError( --> 325 "Cannot convert a partially known TensorShape to a Tensor: %s" % s) 326 s_list = s.as_list()
ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, 256)
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)