I just notice that preprocess of dataset use preprocess_pretrain_dataset function to concatenate different examples and chunk them into cutoff_len. which means that preprocess of dataset will throw away part of sequence whose length shorter than cutoff_len? It is kind of waste if you throw away tokens of cutoff_len every 1000 examples. I'm not sure, is it a common way to handle sequences of pretrain?
I just notice that preprocess of dataset use preprocess_pretrain_dataset function to concatenate different examples and chunk them into cutoff_len. which means that preprocess of dataset will throw away part of sequence whose length shorter than cutoff_len? It is kind of waste if you throw away tokens of cutoff_len every 1000 examples. I'm not sure, is it a common way to handle sequences of pretrain?