pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
2.69k stars 216 forks source link

Ability to train based on epoch #613

Open abatilo opened 1 month ago

abatilo commented 1 month ago

At the moment, you can train using a number of steps, but it would be great if we could train on a number of epochs of the passed in dataset.

Additionally, some way to know how far into a dataset you are when training would be amazing.

jaysonfrancis commented 1 month ago

I use something similar to this for approximating;

epoch_approx = step / (total_tokens // global_batch_size)

where global_batch_size = local_batch * seq_len * world_size