Closed SangwonPark0211 closed 2 years ago
You can subclass the Seq2SeqTrainer and override the _get_train_sampler method. Instead of creating a RandomSampler object, create a SequentialSampler.
from transformers.trainer_seq2seq import Seq2SeqTrainer
from torch.utils.data import SequentialSampler
class SequentialSeq2SeqTrainer(Seq2SeqTrainer):
def _get_train_sampler(self) -> SequentialSampler:
return SequentialSampler(self.train_dataset)
Thank you!! I'll try as you mentioned.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Feature request
I want to train model in the order in which the data are stored.
For example, if there are 100 data, then I want to feed 1st, 2nd data together(because I set batch_size=2 in code) and then 3rd, 4th data and then 5th, 6th data together and so on....
But huggingface Trainer train model using datacollator and this feed data to model randomly by the parameter data_seed.
How can I train model feeding data in the order in which the data are stored?
Motivation
I want to control data feeding order to the model.
Your contribution
I want to control data feeding order to the model.