Currently, there is no truncation on the input text in the back translation augmenter. This leads to hard-to-parse errors when input text longer than the models max input length is provided (and the model is running on the GPU). This PR fixes that by providing the argument truncation=True to the HF tokenizer, which truncates any text longer than the models max input size.
Currently, there is no truncation on the input text in the back translation augmenter. This leads to hard-to-parse errors when input text longer than the models max input length is provided (and the model is running on the GPU). This PR fixes that by providing the argument
truncation=True
to the HF tokenizer, which truncates any text longer than the models max input size.Closes #297