Open smash1999 opened 4 days ago
Could you have a look at your training data loader? Probably print out the length of your data loader and see if there are actually data inside?
Here, we will iterate through the train data loader: https://github.com/hpcaitech/ColossalAI/blob/b1172740743998ca08808e2ad4f93a8fc6cf3035/applications/ColossalChat/coati/trainer/sft.py#L100
How could I get length of data loader?
I add code after for i, batch in enumerate(self.train_dataloader):
and no log print.
Below is code I add.
coordinator.print_on_master(f"Length of DataLoader: {len(self.train_dataloader)}")
Is there an existing issue for this bug?
🐛 Describe the bug
I use ColossalChat to train opt-1.3b model, I modify train_sft.sh and run SFT training, it get successful result but the progress bar is abnormal that show skip evaluation. My command and Log is as below:
Environment