Open abatilo opened 1 month ago
At the moment, you can train using a number of steps, but it would be great if we could train on a number of epochs of the passed in dataset.
Additionally, some way to know how far into a dataset you are when training would be amazing.
I use something similar to this for approximating;
epoch_approx = step / (total_tokens // global_batch_size)
where global_batch_size = local_batch * seq_len * world_size
global_batch_size = local_batch * seq_len * world_size
At the moment, you can train using a number of steps, but it would be great if we could train on a number of epochs of the passed in dataset.
Additionally, some way to know how far into a dataset you are when training would be amazing.