Wrong implementation of gradient accumulation

michaelnny / miniGPT

Try to implement pre-training and fine-tuning GPT-2 model for research and education purpose.

MIT License

8 stars 1 forks source link

Wrong implementation of gradient accumulation #1

Closed michaelnny closed 9 months ago

michaelnny commented 9 months ago

The current implementation of using itertools.islice(train_loader, gradient_accum_steps) to do gradient accumulation is wrong, as it will always use the same batch of data to perform update. Better to replace it with regular enumerate(train_loader) to do that and checking for update steps