Training too slow - Githubissues

janenie commented 6 years ago

Hi Is there any possible way to accelerate the code ? I am running a training data with only 300 vocabulary size and 3w training instances with maximum length 50, but it takes almost 1 hour to finish training a epoch

What happened to this version of code

Thanks

brightmart commented 6 years ago

i am using 100k vocabulary size and 10 million training data, it take 32 hours to training 127k steps with around 17 BLUE for english to chinese. batch size is set to 64.

1.use batch size as big as possible, as long as GPU can support 2.hidden size is 1024 by default, you can reduce it to 800 or 512 if out of memory of GPU. 3.for machine translation, with deeper layers it take long time to train and more memory of GPU, but performance improve is small. you can set layer to 2.

here is the command: CUDA_VISIBLE_DEVICES=7 nohup python -m nmt.nmt --attention=normed_bahdanau --src=en --tgt=zh --train_prefix=nmt_data_chinese/train --dev_prefix=nmt_data_chinese/dev \ --test_prefix=nmt_data_chinese/test --out_dir=nmt_attention_model_big_pte_batch64 --num_train_steps=4800000 --steps_per_stats=100 --num_layers=2 --num_units=800 \ --dropout=0.5 --metrics=bleu --learning_rate=0.001 --optimizer=adam --encoder_type=bi --batch_size=64 --attention_architecture=gnmt_v2 --src_max_len=25 \ --subword_option=bpe --unit_type=layer_norm_lstm --vocab_prefix=nmt_data_chinese/vocabulary &

yapingzhao commented 6 years ago

I have a question to ask: My training language is about 70,000 sentences. How many of my dictionary sizes are appropriate? Thank you.

brightmart commented 6 years ago

It is a small corpus. You can try 50k. You can also use larger size if you cpu/gpu allows.

发件人: zhaoyaping notifications@github.com 发送时间: 2018年4月18日 16:34:20 收件人: tensorflow/nmt 抄送: brightmart; Comment 主题: Re: [tensorflow/nmt] Training too slow (#183)

I have a question to ask: My training language is about 70,000 sentences. How many of my dictionary sizes are appropriate? Thank you.

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/tensorflow/nmt/issues/183#issuecomment-382309089, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASuYMPwwRogmHAkncGy2i_L6KiVxCNtGks5tpvqMgaJpZM4Qm2tH.

yapingzhao commented 6 years ago

I would like to ask if 50k is equivalent to 50000 (dictionary size)?I'm a neural network beginner (smile). Thank you.

brightmart commented 6 years ago

hi,

50k=50,000

发件人: zhaoyaping notifications@github.com 发送时间: 2018年4月18日 17:19 收件人: tensorflow/nmt 抄送: brightmart; Comment 主题: Re: [tensorflow/nmt] Training too slow (#183)

I would like to ask if 50k is equivalent to 5000 (dictionary size)?I'm a neural network beginner (smile). Thank you.

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/tensorflow/nmt/issues/183#issuecomment-382322064, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASuYMFCCzgAQd-VSsH2vLqYvE2SafSceks5tpwUhgaJpZM4Qm2tH.

vikaskumarjha9 commented 5 years ago

@brightmart What is the size of your dev set ? Does it matter if we have bigger dev set, then training takes longer to complete?

brightmart commented 5 years ago

if you have a big dev set, you can choose part of dev set to evaluate during training.

tensorflow / nmt

Training too slow #183