beam_attention_decoder ValueError

jmugan commented 8 years ago

Cool code. In beam_attention_decoder, I get an error at line 615

s = math_ops.reduce_sum(
              v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3])

It looks like maybe y takes the beams into account but hidden_features[a] does not.

The stack trace looks like

File "/Users/jmugan/Box Sync/workspace/DeepLearning/Git/21CT_Translate/ObjectTranslatorTF/ncm_seq2seq.py", line 615, in attention
    v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3])
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 518, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 44, in add
    return _op_def_lib.apply_op("Add", x=x, y=y, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2156, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1612, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1390, in _BroadcastShape
    % (shape_x, shape_y))
ValueError: Incompatible shapes for broadcasting: (32, 2, 1, 300) and (320, 1, 1, 300)

pbhatia243 commented 8 years ago

Did you do it with ubuntu dataset ?

pbhatia243 commented 8 years ago

You use beam decoder only for evaluation

jmugan commented 8 years ago

Different dataset. Thanks for the response. I'll try it on just the evaluation.

dimeldo commented 8 years ago

@jmugan Could you please share your results?

jmugan commented 8 years ago

Sure. I try it in the next few days and let you know how it comes out.

jmugan commented 8 years ago

How do you tell which of the beams returned the best sequence? In line 253 of neural_conversation_model.py the output to step is

path, symbol , output_logits = model.step(sess, encoder_inputs, decoder_inputs,
             target_weights, bucket_id, True,beam_search )

Here, output_logits is not used, as far as I can tell. It is a list of length output_size where each entry is a length of beam_size but how is it different from symbol?

Using symbol and path you put together the beam_size responses, but which is best? Why doesn't output_logits have their cost?

pbhatia243 commented 8 years ago

You do depending on the task. In Smart Reply they find diverse replies depending on how different replies are using semi supervised clustering . Right now. they are sorted on lowest perplexity but that might not be best replies you are looking for

pbhatia243 / Neural_Conversation_Models

beam_attention_decoder ValueError #1