Open martinpopel opened 6 years ago
@martinpopel are these numbers from tensor2tensor 1.2.9 or from a more recent version? (I ask this in relation to bug #529 , as 1.2.9 is the version some of us are working in).
@noe: Yes, these numbers (graph) are with 1.2.9.
@martinpopel How did you find out how many subwords your sentences have?
@mehmedes using this ad-hoc script
Sentences longer than the parameter
max_length
are excluded from training and lowering this parameter helps to prevent OOM errors and allows to use higherbatch_size
, so it is quite useful. Unfortunately, setting this parameter too low results in low BLEU and retarded learning curves. The graph below shows curves (evaluated on dev set) formax_length
25, 50, 70, 150, 200 and 400:There are two possible explanations, but I think both of them are false:
max_length
too low makes the training data smaller. However, withmax_length=70
only 2.1% of my training sentences are excluded. Moreover, the "70" BLEU curve is decreasing after the first hour of training, while processing the whole training data (one epoch) takes more than two days of training.When I increased the
batch_size
from 1500 to 2000, the results improved: the "25" and "50" curves were still retarded, but "70" and higher achieved the same result as when training without anymax_length
restriction. Can someone explain this? Or even fix it if it is a bug?