tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

Setting max_length low makes BLEU unexpectedly worse #582

Open martinpopel opened 6 years ago

martinpopel commented 6 years ago

Sentences longer than the parameter max_length are excluded from training and lowering this parameter helps to prevent OOM errors and allows to use higher batch_size, so it is quite useful. Unfortunately, setting this parameter too low results in low BLEU and retarded learning curves. The graph below shows curves (evaluated on dev set) for max_length 25, 50, 70, 150, 200 and 400: 1gpu-max_length-b1500

There are two possible explanations, but I think both of them are false:

When I increased the batch_size from 1500 to 2000, the results improved: the "25" and "50" curves were still retarded, but "70" and higher achieved the same result as when training without any max_length restriction. Can someone explain this? Or even fix it if it is a bug?

noe commented 6 years ago

@martinpopel are these numbers from tensor2tensor 1.2.9 or from a more recent version? (I ask this in relation to bug #529 , as 1.2.9 is the version some of us are working in).

martinpopel commented 6 years ago

@noe: Yes, these numbers (graph) are with 1.2.9.

mehmedes commented 6 years ago

@martinpopel How did you find out how many subwords your sentences have?

martinpopel commented 6 years ago

@mehmedes using this ad-hoc script