shrimai / Style-Transfer-Through-Back-Translation

162 stars 32 forks source link

About translation models #16

Closed g22gs closed 4 years ago

g22gs commented 5 years ago

Hi, I have several questions about translation models. I was trying to test your pre-trained En-Fr and Fr-En translation models to check their quality on BLEU and I found that command like "python translate.py -model xxx.pt" only worked on En-Fr but failed on Fr-En. I got error message like this.

Traceback (most recent call last): File "translate.py", line 139, in main() File "translate.py", line 65, in main translator = onmt.Translator(opt) File "/home/kouhakuken0609/yasasi/Style-Transfer-Through-Back-Translation/style_decoder/onmt/Translator.py", line 26, in init model.load_state_dict(checkpoint['model']) KeyError: 'model'

I guess it's because the Fr-En only contains parameters of encoder layers, am I right? BTW are your BLEU scores of translation model reported on newstests2015?

I also want to replace your pre-trained translation models with ones of other language pairs to do some tests.
Any advice and suggestions will be greatly appreciated.

shrimai commented 5 years ago

For the Fr-En model you have to use the options '-encoder_model' and '-decoder_model' mentioned in lines 11-18 in translate.py file. You have to use the '-model' option only for the En-Fr model, for all other models use the other two options.

Yes, my BLEU scores are reported in the test set of WMT 2015.

Yes, you can surely replace the translation model using your own 2 languages. You will have to train the MT models using the nmt_train.py script and then use your trained models for training the style transfer generative models.

g22gs commented 5 years ago

For the Fr-En model you have to use the options '-encoder_model' and '-decoder_model' mentioned in lines 11-18 in translate.py file. You have to use the '-model' option only for the En-Fr model, for all other models use the other two options.

Yes, my BLEU scores are reported in the test set of WMT 2015.

Yes, you can surely replace the translation model using your own 2 languages. You will have to train the MT models using the nmt_train.py script and then use your trained models for training the style transfer generative models.

Thank you for the fast reply! So if I want to simply translate text using your Fr-En model without any style transferring, what generator file should I provide for the option '-decoder_model' ? I am currently using the same Fr-En model file for '-encoder_model' and '-decoder_model' but I am not sure if it's right.

g22gs commented 5 years ago

Sorry for disturbing once again but I couldn't get the expected BLEU as you reported. I am wondering if some of my steps went wrong. My steps are as follow.

https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh

Modifying the script took from faiseq, I have downloaded WMT2015 test set and preprocessed it. (BPE and Tokenize included)

The 4 raw files I got are: newsdiscusstest2015-enfr-src.en.sgm, newsdiscusstest2015-enfr-ref.fr.sgm newsdiscusstest2015-fren-src.fr.sgm, newsdiscusstest2015-fren-ref.en.sgm

After preprocessing, I changed them into: test.enfr-en, test.enfr-fr test.fren-fr, test.fren-en respectively.

Each file had 1500 lines. Then I did experiment by following comment and got the BLEU.

python translate.py \ 
-model ../models/translation/english_french/english_french.pt \
-src WMT15/test.enfr-en \
-output WMT15/test.enfr-fr.translated \
-replace_unk \
-gpu 0
perl ~/OpenNMT-py/tools/multi-bleu.perl test.enfr-fr < test.enfr-fr.translated

BLEU = 24.37, 51.0/29.2/18.9/12.5 (BP=1.000, ratio=1.008, hyp_len=30323, ref_len=30074)

python translate.py 
-encoder_model ../models/translation/english_french/english_french.pt \
-decoder_model ../models/translation/english_french/english_french.pt \
-src WMT15/test.fren-fr \
-output WMT15/test.fren-en.translated \
-replace_unk \
-gpu 0
perl ~/OpenNMT-py/tools/multi-bleu.perl test.fren-en < test.fren-en.translated

BLEU = 25.01, 53.7/30.2/19.1/12.6 (BP=1.000, ratio=1.025, hyp_len=28098, ref_len=27421)

Any advice and suggestions will be greatly appreciated.

g22gs commented 5 years ago

Yes, you can surely replace the translation model using your own 2 languages. You will have to train the MT models using the nmt_train.py script and then use your trained models for training the style transfer generative models.

Another problem during replacing translation models. Simply testing an En-En (just make source as same as reference) models, I used nmt_train.py to train and got .pt file. When I was training generator models something went wrong. The command and error message is below.

python train_decoder.py \
-data data/negative_enen_generator.train.pt \
-save_model sentiment_enen_models/negative_enen_generator \
-classifier_model ../models/classifier/sentiment_classifier/sentiment_classifier.pt \
-encoder_model en_en/negative-model_acc_99.88_ppl_1.02_e13.pt \
-tgt_label 0 \
-gpus 0

Namespace(batch_size=64, brnn=True, brnn_merge='concat', class_weight=1.0, classifier_model='../models/classifier/sentiment_classifier/sentiment_classifier.pt', curriculum=False, data='data/negative_enen_generator.train.pt', dropout=0.3, encoder_model='en_en/negative-model_acc_99.88_ppl_1.02_e13.pt', epochs=13, extra_shuffle=False, gpus=[0], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, nll_weight=1.0, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, save_model='sentiment_enen_models/negative_enen_generator', sequence_length=50, start_decay_at=8, start_epoch=1, temperature=1.0, tgt_label=0, train_from='', train_from_state_dict='', word_vec_size=300) Loading data from 'data/negative_enen_generator.train.pt'

  • vocabulary size. source = 8689; target = 8689
  • number of training sentences. 176787
  • maximum batch size. 64 Loading Encoder Model ... Loading CNN Classifier Model ... Building model...
    • number of parameters: 26374793 DecoderModel( (decoder): Decoder( (word_lut): Embedding(8689, 300, padding_idx=0) (rnn): StackedLSTM( (dropout): Dropout(p=0.3) (layers): ModuleList( (0): LSTMCell(800, 500) (1): LSTMCell(500, 500) ) ) (attn): GlobalAttention( (linear_in): Linear(in_features=500, out_features=500) (sm): Softmax() (linear_out): Linear(in_features=1000, out_features=500) (tanh): Tanh() ) (dropout): Dropout(p=0.3) ) (encoder): Encoder( (word_lut): Embedding(8689, 300, padding_idx=0) (rnn): LSTM(300, 500, num_layers=2, dropout=0.3) ) (generator): Sequential( (0): Linear(in_features=500, out_features=8689) (1): LogSoftmax() ) (class_input): Sequential( (0): Linear(in_features=500, out_features=9603) ) (class_model): ConvNet( (word_lut): Embedding(9603, 300, padding_idx=0) (conv1): Conv2d (300, 100, kernel_size=(5, 1), stride=(1, 1)) (relu1): ReLU() (maxpool1): MaxPool3d(kernel_size=(1, 46, 1, 1), stride=(1, 1, 1, 1), padding=0, dilation=1, ceil_mode=False) (dropout): Dropout(p=0.2) (linear): Linear(in_features=100, out_features=1) (sigmoid): Sigmoid() ) ) Traceback (most recent call last): File "train_decoder.py", line 437, in main() File "train_decoder.py", line 434, in main trainModel(model, trainData, validData, dataset, optim) File "train_decoder.py", line 286, in trainModel train_loss, train_acc = trainEpoch(epoch) File "train_decoder.py", line 247, in trainEpoch outputs = model(batch, encStates, context) File "/home/kouhakuken0609/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, kwargs) File "/home/kouhakuken0609/yasasi/Style-Transfer-Through-Back-Translation/style_decoder/onmt/Models_decoder.py", line 116, in forward out, dec_hidden, _attn = self.decoder(tgt, enc_hidden, context, init_output) File "/home/kouhakuken0609/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, *kwargs) File "/home/kouhakuken0609/yasasi/Style-Transfer-Through-Back-Translation/style_decoder/onmt/Models_decoder.py", line 77, in forward output, hidden = self.rnn(emb_t, hidden) File "/home/kouhakuken0609/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(input, kwargs) File "/home/kouhakuken0609/yasasi/Style-Transfer-Through-Back-Translation/style_decoder/onmt/Models_decoder.py", line 26, in forward h_1_i, c_1_i = layer(input, (h_0[i], c_0[i])) File "/home/kouhakuken0609/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, **kwargs) File "/home/kouhakuken0609/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 615, in forward self.bias_ih, self.bias_hh, File "/home/kouhakuken0609/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 27, in LSTMCell hgates = F.linear(hidden[0], w_hh) File "/home/kouhakuken0609/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py", line 837, in linear output = input.matmul(weight.t()) File "/home/kouhakuken0609/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/variable.py", line 386, in matmul return torch.matmul(self, other) File "/home/kouhakuken0609/anaconda3/envs/py36/lib/python3.6/site-packages/torch/functional.py", line 173, in matmul return torch.mm(tensor1, tensor2) RuntimeError: size mismatch at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCTensorMathBlas.cu:243

I am wondering if I need to make some adjust to train generator with my own translation model.

shrimai commented 5 years ago

Yes, you use the same Fr-En .pt file with both options -encoder_model and -decoder_model.

I did not use BPE, I only used moses tokenizer. I calculated BLEU using the moses multi-bleu script.

Yes, check the output size of your encoder of the translation model. You might want to print the sizes at the place where it is failing and checking that they are as expected. Also make sure of using pytorch 0.3.1

g22gs commented 5 years ago

Thank you for replying. I am sorry that I still couldn't get the reported BLEU on your translation models. Could you tell me which script you have used to preprocess the .sgm file of WMT2015 test set ?

shrimai commented 5 years ago

Sorry for the late reply, I was out on vacation. Can you provide me with your email address? I can send you the translated files and the original tokenized files.

g22gs commented 5 years ago

Sorry for the late reply, I was out on vacation. Can you provide me with your email address? I can send you the translated files and the original tokenized files.

g22gs60517@gmail.com Appreciate it.

g22gs commented 5 years ago

Sorry for disturbing again. I have trained a French-English model using nmt_train.py. The options were as following.

python nmt_train.py -data $OUTPUT_DIR/wmt14_fr_en.train.pt \ -save_model $OUTPUT_DIR/wmt14_fr_en-brnn-model \ -brnn \ -gpus 0

The pure translation model worked fine without classifier. However when I wanted to use the encoder of this model in generator training, I got error message like this.

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCTensorMath.cu:243

I guessed some thing went wrong with the setting of nmt_train.py . Could you tell me which options you exactly used during training the pre-trained translation model using nmt_train.py?