mila-iqia / blocks-examples

Examples and scripts using Blocks
MIT License
147 stars 94 forks source link

Machine translation 'translation' mode giving "</S> </S> </S> </S> </S> </S> . </S> . . </S> </S>" #110

Open HuidaQ opened 6 years ago

HuidaQ commented 6 years ago

I'm trying to train a NMT model on commoncrawl data (from http://www.statmt.org/wmt15/translation-task.html). The training seems to be doing fine. A paste of the partial log:

Training status:
         batch_interrupt_received: False
         epoch_interrupt_received: False
         epoch_started: True
         epochs_done: 0
         iterations_done: 558
         received_first_batch: True
         resumed_from: None
         training_started: True
Log records from the iteration 558:
         decoder_cost_cost: 134.178924561

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Training status:
         batch_interrupt_received: False
         epoch_interrupt_received: False
         epoch_started: True
         epochs_done: 0
         iterations_done: 559
         received_first_batch: True
         resumed_from: None
         training_started: True
Log records from the iteration 559:
         decoder_cost_cost: 148.906814575

Input :  Ce morceau de code fournit un aperçu de votre travail , une brève description , et un bouton &gt; Achetez maintenant . </S>
Target:  This bit of code provides a preview of your work , a brief description , and a &gt; Buy Now button . </S>
Sample:  information taxi that It to <UNK> to <UNK> while work . </S>
Sample cost:  230.718

Input :  C ’ est la question écrite que pose sans <UNK> la société de gestion <UNK> Active <UNK> à l&apos; assemblée 2010 de <UNK> . </S>
Target:  This is the written question which puts the asset manager company <UNK> Active Investors to the 2010 of the <UNK> . </S>
Sample:  a share the look . </S>
Sample cost:  314.725

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Training status:
         batch_interrupt_received: False
         epoch_interrupt_received: False
         epoch_started: True
         epochs_done: 0
         iterations_done: 560
         received_first_batch: True
         resumed_from: None
         training_started: True
Log records from the iteration 560:
         decoder_cost_cost: 164.382553101

INFO:machine_translation.checkpoint: Saving model
INFO:machine_translation.checkpoint: ...saving parameters
INFO:machine_translation.checkpoint: ...saving iteration state
INFO:machine_translation.checkpoint: ...saving log
INFO:machine_translation.checkpoint: Model saved, took 13.1227090359 seconds.
.
.
.

But when I use the translation mode (from https://github.com/mila-udem/blocks-examples/pull/43/files#r62537666), even if I pick a sentence from the training data itself, it'll give me a sequence of '', or 'the', or '' alway. Here's the translation log:

> blocks-examples $ python -m machine_translation --proto get_config_fr2en_cc --mode translate --test-file data/my_test.fr.tok                                                                                                      [35/102]
INFO:__main__:Model options:
{'batch_size': 80,
 'beam_size': 12,
 'bleu_script': './data/multi-bleu.perl',
 'bleu_val_freq': 5000,
 'bos_token': '<S>',
 'dec_embed': 620,
 'dec_nhids': 1000,
 'dropout': 1.0,
 'enc_embed': 620,
 'enc_nhids': 1000,
 'eos_token': '</S>',
 'finish_after': 1000000,
 'hook_samples': 2,
 'normalized_bleu': True,
 'output_val_set': True,
 'reload': True,
 'sampling_freq': 13,
 'save_freq': 10,
 'saveto': 'search_model_fr2en_cc',
 'seq_len': 50,
 'sort_k_batches': 12,
 'src_data': './data/commoncrawl.fr-en.fr.tok.shuf',
 'src_vocab': './data/vocab.fr-en.fr.pkl',
 'src_vocab_size': 30000,
 'step_clipping': 1.0,
 'step_rule': 'AdaDelta',
 'stream': 'stream',
 'test_set': 'data/my_test.fr.tok',
 'trg_data': './data/commoncrawl.fr-en.en.tok.shuf',
 'trg_vocab': './data/vocab.fr-en.en.pkl',
 'trg_vocab_size': 30000,
 'unk_id': 1,
 'unk_token': '<UNK>',
 'val_burn_in': 80000,
 'val_set': './data/newstest2013.fr.tok',
 'val_set_grndtruth': './data/newstest2013.en.tok',
 'val_set_out': 'search_model_fr2en_cc/validation_out.txt',
 'weight_noise_ff': False,
 'weight_noise_rec': False,
 'weight_scale': 0.01}
INFO:machine_translation:Building RNN encoder-decoder
INFO:machine_translation:Creating theano variables
INFO:machine_translation:Building sampling model
INFO:machine_translation:Loading the model..
INFO:machine_translation.checkpoint: Loaded to CG (1000,)        : /bidirectionalencoder/bidirectionalwmt15/backward.initial_state
INFO:machine_translation.checkpoint: Loaded to CG (2000,)        : /bidirectionalencoder/back_fork/fork_gate_inputs.b
INFO:machine_translation.checkpoint: Loaded to CG (620, 2000)    : /bidirectionalencoder/back_fork/fork_gate_inputs.W
INFO:machine_translation.checkpoint: Loaded to CG (1000,)        : /bidirectionalencoder/back_fork/fork_inputs.b
INFO:machine_translation.checkpoint: Loaded to CG (620, 1000)    : /bidirectionalencoder/back_fork/fork_inputs.W
INFO:machine_translation.checkpoint: Loaded to CG (1000,)        : /bidirectionalencoder/bidirectionalwmt15/forward.initial_state
INFO:machine_translation.checkpoint: Loaded to CG (2000,)        : /bidirectionalencoder/fwd_fork/fork_gate_inputs.b
INFO:machine_translation.checkpoint: Loaded to CG (620, 2000)    : /bidirectionalencoder/fwd_fork/fork_gate_inputs.W
INFO:machine_translation.checkpoint: Loaded to CG (1000,)        : /bidirectionalencoder/fwd_fork/fork_inputs.b
INFO:machine_translation.checkpoint: Loaded to CG (620, 1000)    : /bidirectionalencoder/fwd_fork/fork_inputs.W
INFO:machine_translation.checkpoint: Loaded to CG (1000,)        : /decoder/sequencegenerator/att_trans/decoder/state_initializer/linear_0.b
INFO:machine_translation.checkpoint: Loaded to CG (30000, 620)   : /bidirectionalencoder/embeddings.W
INFO:machine_translation.checkpoint: Loaded to CG (1000, 2000)   : /bidirectionalencoder/bidirectionalwmt15/forward.state_to_gates
INFO:machine_translation.checkpoint: Loaded to CG (1000, 1000)   : /bidirectionalencoder/bidirectionalwmt15/forward.state_to_state
INFO:machine_translation.checkpoint: Loaded to CG (1000, 2000)   : /bidirectionalencoder/bidirectionalwmt15/backward.state_to_gates
INFO:machine_translation.checkpoint: Loaded to CG (1000, 1000)   : /bidirectionalencoder/bidirectionalwmt15/backward.state_to_state
INFO:machine_translation.checkpoint: Loaded to CG (1000, 1000)   : /decoder/sequencegenerator/att_trans/decoder/state_initializer/linear_0.W
INFO:machine_translation.checkpoint: Loaded to CG (1000, 2000)   : /decoder/sequencegenerator/att_trans/decoder.state_to_gates
INFO:machine_translation.checkpoint: Loaded to CG (2000, 1000)   : /decoder/sequencegenerator/att_trans/attention/preprocess.W
INFO:machine_translation.checkpoint: Loaded to CG (1000,)        : /decoder/sequencegenerator/att_trans/attention/preprocess.b
INFO:machine_translation.checkpoint: Loaded to CG (1000, 1000)   : /decoder/sequencegenerator/att_trans/attention/state_trans/transform_states.W
INFO:machine_translation.checkpoint: Loaded to CG (1000, 1)      : /decoder/sequencegenerator/att_trans/attention/energy_comp/linear.W
INFO:machine_translation.checkpoint: Loaded to CG (2000, 2000)   : /decoder/sequencegenerator/att_trans/distribute/fork_gate_inputs.W
INFO:machine_translation.checkpoint: Loaded to CG (1000, 1000)   : /decoder/sequencegenerator/readout/merge/transform_states.W
INFO:machine_translation.checkpoint: Loaded to CG (30000, 620)   : /decoder/sequencegenerator/readout/lookupfeedbackwmt15/lookuptable.W
INFO:machine_translation.checkpoint: Loaded to CG (620, 1000)    : /decoder/sequencegenerator/readout/merge/transform_feedback.W
INFO:machine_translation.checkpoint: Loaded to CG (2000, 1000)   : /decoder/sequencegenerator/readout/merge/transform_weighted_averages.W
INFO:machine_translation.checkpoint: Loaded to CG (1000,)        : /decoder/sequencegenerator/readout/initializablefeedforwardsequence/maxout_bias.b
INFO:machine_translation.checkpoint: Loaded to CG (500, 620)     : /decoder/sequencegenerator/readout/initializablefeedforwardsequence/softmax0.W
INFO:machine_translation.checkpoint: Loaded to CG (620, 30000)   : /decoder/sequencegenerator/readout/initializablefeedforwardsequence/softmax1.W
INFO:machine_translation.checkpoint: Loaded to CG (30000,)       : /decoder/sequencegenerator/readout/initializablefeedforwardsequence/softmax1.b
INFO:machine_translation.checkpoint: Loaded to CG (620, 2000)    : /decoder/sequencegenerator/fork/fork_gate_inputs.W
INFO:machine_translation.checkpoint: Loaded to CG (2000,)        : /decoder/sequencegenerator/fork/fork_gate_inputs.b
INFO:machine_translation.checkpoint: Loaded to CG (1000, 1000)   : /decoder/sequencegenerator/att_trans/decoder.state_to_state
INFO:machine_translation.checkpoint: Loaded to CG (2000, 1000)   : /decoder/sequencegenerator/att_trans/distribute/fork_inputs.W
INFO:machine_translation.checkpoint: Loaded to CG (620, 1000)    : /decoder/sequencegenerator/fork/fork_inputs.W
INFO:machine_translation.checkpoint: Loaded to CG (1000,)        : /decoder/sequencegenerator/fork/fork_inputs.b
INFO:machine_translation.checkpoint: Number of parameters loaded for computation graph: 37
INFO:machine_translation:Started translation:
INFO:machine_translation:Source: ([769, 6, 2979, 2, 1177, 7173, 27, 180, 79, 3494, 3, 2263, 5, 734, 4, 29999],)
INFO:machine_translation:Translated: </S> </S> </S> </S> </S> </S> . </S> . . </S> </S>
INFO:machine_translation:Total cost of the test: 0.929869651794

'my_test.fr.tok' file only has 1 line: Sur la baie de San Antonio vous avez tous commerces , bars et restaurants .

Appreciate any help. Thanks.

dmitriy-serdyuk commented 6 years ago

Perhaps, something is wrong with your dataset. This doesn't look good:

Input :  Ce morceau de code fournit un aperçu de votre travail , une brève description , et un bouton &gt; Achetez maintenant . </S>
Target:  This bit of code provides a preview of your work , a brief description , and a &gt; Buy Now button . </S>
Sample:  information taxi that It to <UNK> to <UNK> while work . </S>
Sample cost:  230.718
HuidaQ commented 6 years ago

I tried the default dataset in the prepare_data.py script (parallel-nc-v10), got the same thing.