nyu-dl / dl4mt-nonauto

BSD 3-Clause "New" or "Revised" License
120 stars 17 forks source link

Training error (num_gpu argument) #3

Closed butsugiri closed 5 years ago

butsugiri commented 5 years ago

Thank you for sharing the code!

I tried running your model with multiple GPU settings as follows, and I got an error from BucketIterator. It seems that BucketIterator (from torchtext) does not accept num_gpus argument. I am using torchtext (version 0.3.1).

python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --fast --valid_repeat_dec 8 --use_argmax --next_dec_input both --denoising_prob 0.5 --layerwise_denoising_weight --use_distillation --num_gpus 3
2019-02-20 16:05:59 INFO: - random seed is 19920206
2019-02-20 16:05:59 INFO: - TRAINING CORPUS : /work01/kiyono/dl4mt-nonauto-data/iwslt/en-de/distill/ende/train.tags.en-de.bpe
2019-02-20 16:06:02 INFO: - before pruning : 195897 training examples
2019-02-20 16:06:02 INFO: - after pruning : 195897 training examples
Traceback (most recent call last):
  File "run.py", line 572, in <module>
    num_gpus=args.num_gpus)
TypeError: __init__() got an unexpected keyword argument 'num_gpus'

Do you have any ideas about how to avoid this error?

baoy-nlp commented 5 years ago

Thank you for sharing the code!

I tried running your model with multiple GPU settings as follows, and I got an error from BucketIterator. It seems that BucketIterator (from torchtext) does not accept num_gpus argument. I am using torchtext (version 0.3.1).

python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --fast --valid_repeat_dec 8 --use_argmax --next_dec_input both --denoising_prob 0.5 --layerwise_denoising_weight --use_distillation --num_gpus 3
2019-02-20 16:05:59 INFO: - random seed is 19920206
2019-02-20 16:05:59 INFO: - TRAINING CORPUS : /work01/kiyono/dl4mt-nonauto-data/iwslt/en-de/distill/ende/train.tags.en-de.bpe
2019-02-20 16:06:02 INFO: - before pruning : 195897 training examples
2019-02-20 16:06:02 INFO: - after pruning : 195897 training examples
Traceback (most recent call last):
  File "run.py", line 572, in <module>
    num_gpus=args.num_gpus)
TypeError: __init__() got an unexpected keyword argument 'num_gpus'

Do you have any ideas about how to avoid this error?

you can checkout the branch to "multigpu".

butsugiri commented 5 years ago

I am already using multigpu branch (commit: e15acb2601cfa483394e1897ffeaff449a3a95fc).

In https://github.com/nyu-dl/dl4mt-nonauto/blob/multigpu/run.py#L536-L537, there is num_gpu argument, which is not available in torchtext.data.BucketIterator (https://torchtext.readthedocs.io/en/latest/data.html#torchtext.data.BucketIterator)

mansimov commented 5 years ago

Hi,

I forgot to add in README that you need to use my modified torchtext that supports num_gpus argument https://github.com/mansimov/pytorch_text_multigpu

I will update the README. Also try using PyTorch 0.4.* for consistency Can you try it and let me know ?

butsugiri commented 5 years ago

Thank you for your reply! I will try the modified version and see what happens.

baoy-nlp commented 5 years ago

Thank you very much for sharing, and I would like to ask that, how we can run the code for the performance consistent with paper, specifically the IWSLT 16-ENDE experiment. I've tried to run it, but BLEU is always below that of the paper about five to six. Could you give us a set of specific settings for IWLT-ENDE by the way? Thank you very much.

mansimov commented 5 years ago

Off the top of my head, try running the following script in the main branch

python run.py --dataset iwslt-ende --vocab_size 40000 --load_vocab --ffw_block highway --params small --batch_size 2048 --eval_every 1000 --lr_schedule anneal --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --denoising_prob --layerwise_denoising_weight --use_distillation

After training it you need to train the length prediction module by running above script with --load_from with specified trained model and --resume --trg_len_option predict --finetune_trg_len

The script should be similar in multigpu branch

python run.py --dataset iwslt-ende --vocab_size 40000 --load_vocab --ffw_block highway --params small --batch_size 2048 --num_gpus 2 --eval_every 1000 --lr_schedule anneal --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --denoising_prob --layerwise_denoising_weight --use_distillation

butsugiri commented 5 years ago

@mansimov I installed the modified version of torchtext and confirmed that the training actually works. Thank you again for your advice.

mansimov commented 5 years ago

Great! @butsugiri & @baoy-nlp feel free to ask me any other questions and update me on your progress!