tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.37k stars 1.96k forks source link

Tensorflow: Beamsearch questions, seems as cannot load pre-trained weights #330

Open minorfox opened 6 years ago

minorfox commented 6 years ago

I am implementing a Seq2Seq model in Tensorflow. I have pre-trained weights for the model, when I used Greedydecoder, I got a normal score like 14.12, but when I used BeamsearchDecoder(beamsize=3 or 5, the same results), I only got a 0.02 score, which seems like the weights have not load. That is confused me all day, could some one give me a clue.

Here is my relative code, first is the Attention:

 if model_helper.beam_size > 0:
             memory_len = tf.contrib.seq2seq.tile_batch(memory_len, multiplier=model_helper.beam_size)
             memory_inp = tf.contrib.seq2seq.tile_batch(memory_inp, multiplier=model_helper.beam_size)

            attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units=model_helper.unit_size,
                                                                    memory=memory_inp,
                                                                    memory_sequence_length=memory_len)

            deco_cell = tf.contrib.seq2seq.AttentionWrapper(cell=cell,
                                                            attention_mechanism=attention_mechanism,
                                                            attention_layer_size=model_helper.unit_size,
                                                            alignment_history=self.trainable or model_helper.beam_size == 0,
                                                            name='attention')

Here is Greedydecoder, which seems right when I load my pre-trained weights, I use a variable "beam_size" to switch if use the Greedy or Beamsearch:

if model_helper.beam_size == 0:
                    decoder_initial_state = deco_cell.zero_state(model_helper.batch_size, tf.float32).clone(cell_state=init_states)
                    start_tokens = tf.fill([model_helper.batch_size], model_helper.GO_ID)
                    end_token = model_helper.END_ID
                    print('using greedy decoder...')
                    helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding=self.embed_dict,
                                                                      start_tokens=start_tokens,
                                                                      end_token=end_token)

                    decoder = tf.contrib.seq2seq.BasicDecoder(cell=deco_cell,
                                                           helper=helper,
                                                           initial_state=decoder_initial_state,
                                                           output_layer=output_layer)

Next is the Beamsearchdecoder:

tiled_states = tf.contrib.seq2seq.tile_batch(init_states, multiplier=model_helper.beam_size)
                    beam_initial_states = deco_cell.zero_state(model_helper.batch_size*model_helper.beam_size, tf.float32).clone(cell_state=tiled_states)
                    start_tokens = tf.fill([model_helper.batch_size], model_helper.GO_ID)
                    end_token = model_helper.END_ID

                    decoder = tf.contrib.seq2seq.BeamSearchDecoder(cell=deco_cell,
                                                                    embedding=self.embed_dict,
                                                                    start_tokens=start_tokens,
                                                                    end_token=end_token,
                                                                    initial_state=beam_initial_states,
                                                                    beam_width=model_helper.beam_size,
                                                                    output_layer=output_layer)

After this codes, I used dynamic_decoder to deco. But I got a very low score in Beamsearchdecoder, but not bad in Greedydecoder, could some one tell me why.

And also, I found another question, the return of FinalBeamSearchDecoderOutput is: (predicted_ids, beam_search_decoder_output), but I print the predicted_ids and the beam_search_decoder_output.predicted_ids, I found the two are different.

HenryL-study commented 6 years ago

Hi, I have the same question in Beam search decoder, have you fixed it?

husztidorottya commented 5 years ago

Hi! Same problem here, any suggestion ?