DistributedBatchSamper crashes on small eval sample sizes

tdrussell / qlora-pipe

A pipeline parallel training script for LLMs.

MIT License

83 stars 8 forks source link

As of commit 20aeadd this should be fixed in all cases, for all combinations of hyperparameters. The only remaining caveat, is that now there might be one batch that could be much smaller than all the others. This doesn't matter for eval, but for training it means the gradient for just one step is based on much less tokens than the other steps. Theoretically that might cause instability, but for just a single step I don't think it matters.

I reworked the code quite a bit, so let me know if something is breaking or not working as expected.

tdrussell / qlora-pipe

DistributedBatchSamper crashes on small eval sample sizes #2