tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.37k stars 1.96k forks source link

Why does inference run 20X slower than training? #204

Closed David-Levinthal closed 6 years ago

David-Levinthal commented 6 years ago

for example translating a file made by concatenating the newstest2009->2016) using a converenged 4 layer model from standard_hparams done, num sentences 22191, num translations per input 1, time 1498s, Mon Dec 4 11:05:44 2017. bleu: 27.8

while training that model produced output like: global step 339800 lr 0.00195312 step-time 0.57s wps 12.53K ppl 8.93 bleu 28.86 global step 339900 lr 0.00195312 step-time 0.58s wps 12.54K ppl 9.11 bleu 28.86 global step 340000 lr 0.000976562 step-time 0.57s wps 12.56K ppl 9.15 bleu 28.86

Best bleu, step 340000 step-time 0.57 wps 12.56K, dev ppl 5.58, dev bleu 28.9, test ppl 5.54, test bleu 29.7, Mon Nov 20 00:40:57 2017

I wonder if this is due to printing out the translated sentence to a file?

TF R1.4, NMT tf-1.4, cuda9 cudnn7, ubuntu 16.04, V100

David-Levinthal commented 6 years ago

a bit more info just to do the math: 1498/22191 = 67.5 msec 0.57/128 = 4.4 msec
ok..only 15X

german to english translation example from readme 32K vocabulary

oahziur commented 6 years ago

@David-Levinthal

It is expected that inference will be slower than the training because we use teacher forcing during training, but we have to auto regressive generate each word during inference.

In addition, I think you are using beam search, so you are evaluating batch_size * beam_width hypothesis at each time step during inference.

David-Levinthal commented 6 years ago

is the beam search not applied during training?

On Wed, Dec 6, 2017 at 9:02 AM, Rui Zhao notifications@github.com wrote:

@David-Levinthal https://github.com/david-levinthal

It is expected that inference will be slower than the training because we use teacher forcing during training, but we have to auto regressive generate each word during inference.

In addition, I think you are using beam search, so you are evaluating batch_size * beam_width hypothesis at each time step during inference.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/nmt/issues/204#issuecomment-349705761, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuTy7nxOTdwqPYzdn4NoU0j89qblP7ks5s9siKgaJpZM4Q2sot .

David-Levinthal commented 6 years ago

BTW..driving the output to /dev/null changes nothing done, num sentences 22191, num translations per input 1, time 1494s, the regular run to a file with the same GPU/machine ran at 1498.. just thought I would share the measurement d

On Wed, Dec 6, 2017 at 9:36 AM, David Levinthal david.levinthal1@gmail.com wrote:

is the beam search not applied during training?

On Wed, Dec 6, 2017 at 9:02 AM, Rui Zhao notifications@github.com wrote:

@David-Levinthal https://github.com/david-levinthal

It is expected that inference will be slower than the training because we use teacher forcing during training, but we have to auto regressive generate each word during inference.

In addition, I think you are using beam search, so you are evaluating batch_size * beam_width hypothesis at each time step during inference.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/nmt/issues/204#issuecomment-349705761, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuTy7nxOTdwqPYzdn4NoU0j89qblP7ks5s9siKgaJpZM4Q2sot .

oahziur commented 6 years ago

@David-Levinthal

Beam Search is not used during training.

Why are you expect output to /dev/null will change the measurement? The bottleneck should be the decoding process.

David-Levinthal commented 6 years ago

just checking because it was easy :-) I will look into profiling the inference and share the results if you like. I worry that the throughput of 50-100 msec/sentence may be problematic for somebody building a cloud service.

On Thu, Dec 7, 2017 at 10:27 AM, Rui Zhao notifications@github.com wrote:

@David-Levinthal https://github.com/david-levinthal

Beam Search is not used during training.

Why are you expect output to /dev/null will change the measurement? The bottleneck should be the decoding process.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/nmt/issues/204#issuecomment-350053518, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuTwP1Fz6DVZYRIIWL9VQfjgEkSt1Iks5s-C4mgaJpZM4Q2sot .

oahziur commented 6 years ago

@David-Levinthal

Try set the beam_width=0, and see how much faster you can get. I think you can also try use a larger batch size, and pre sorting your inference file by sequence length.

David-Levinthal commented 6 years ago

I just started a run with infer_batch set to 128 in the json file...I will crank down the beam size in a follow up (also with infer_bath = 128) :-) thanks for the suggestion I have an early version of a whitepaper on RNN performance evaluation using translators (destined for posting on a github site I use), can I get you to look it over for mistakes? I have not yet started on a discussion of inference for reasons all too apparent...ie I don't understand what is happening :-) d

On Thu, Dec 7, 2017 at 10:56 AM, Rui Zhao notifications@github.com wrote:

@David-Levinthal https://github.com/david-levinthal

Try set the beam_width=0, and see how much faster you can get. I think you can also try use a larger batch size, and pre sorting your inference file by sequence length.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/nmt/issues/204#issuecomment-350061108, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuT7PfKni_DsBPDwsBe52Itbimj259ks5s-DTigaJpZM4Q2sot .

David-Levinthal commented 6 years ago

Rui, you are exactly right infer batch_size = 128 beam_width=10 done, num sentences 22191, num translations per input 1, time 1664s, bleu: 27.8 infer batch_size = 32 beam_width=1 done, num sentences 22191, num translations per input 1, time 308s, bleu: 26.2 infer batch_size = 32 beam_width=0 done, num sentences 22191, num translations per input 1, time 330s, bleu: 26.2 infer_batch_size = 128 beam_width=1 done, num sentences 22191, num translations per input 1, time 217s, bleu: 26.2 with the final value of 9.8 msec/batch

On Thu, Dec 7, 2017 at 10:56 AM, Rui Zhao notifications@github.com wrote:

@David-Levinthal https://github.com/david-levinthal

Try set the beam_width=0, and see how much faster you can get. I think you can also try use a larger batch size, and pre sorting your inference file by sequence length.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/nmt/issues/204#issuecomment-350061108, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuT7PfKni_DsBPDwsBe52Itbimj259ks5s-DTigaJpZM4Q2sot .

David-Levinthal commented 6 years ago

happy new year more data and my conjectures can be found here https://github.com/David-Levinthal/machine-learning/blob/master/Evaluating%20RNN%20performance%20across%20HW%20platforms.pdf let me know if I have anything in need of correction in it. I will update the pdf right away d

On Thu, Dec 7, 2017 at 1:54 PM, David Levinthal david.levinthal1@gmail.com wrote:

Rui, you are exactly right infer batch_size = 128 beam_width=10 done, num sentences 22191, num translations per input 1, time 1664s, bleu: 27.8 infer batch_size = 32 beam_width=1 done, num sentences 22191, num translations per input 1, time 308s, bleu: 26.2 infer batch_size = 32 beam_width=0 done, num sentences 22191, num translations per input 1, time 330s, bleu: 26.2 infer_batch_size = 128 beam_width=1 done, num sentences 22191, num translations per input 1, time 217s, bleu: 26.2 with the final value of 9.8 msec/batch

On Thu, Dec 7, 2017 at 10:56 AM, Rui Zhao notifications@github.com wrote:

@David-Levinthal https://github.com/david-levinthal

Try set the beam_width=0, and see how much faster you can get. I think you can also try use a larger batch size, and pre sorting your inference file by sequence length.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/nmt/issues/204#issuecomment-350061108, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuT7PfKni_DsBPDwsBe52Itbimj259ks5s-DTigaJpZM4Q2sot .

oahziur commented 6 years ago

@David-Levinthal Thanks for the update! I will close this issue for now. feel free to re-open or create issues if you have more questions.