Closed medric49 closed 7 months ago
According to this it seems trl.PPOConfig.steps
attribute is never used to generate the dataloader.
A solution that might help to set the actual number of training steps using steps
:
def dataloader():
current_step = 0
dataloader_iter = iter(ppo_trainer.dataloader)
while current_step < learning_config.steps:
try:
yield next(dataloader_iter)
except StopIteration:
dataloader_iter = iter(ppo_trainer.dataloader)
yield next(dataloader_iter)
current_step += 1
for step, batch in tqdm(enumerate(dataloader()), total=ppo_config.steps):
queries = batch["input_ids"]
Hi @medric49, I think the number of steps is not related to the number of batches in the dataloader, but to how many optimization (gradient descent) steps you take. Imagine you have 10 batches in your training dataloader, if learning_config.steps
is 20, you will go over the whole training dataset twice (each batch will be used twice to do a gradient descent step) !
Hi @karl-hajjar
I think what you have just described is the role of the ppo_epochs
parameter.
If ppo_epochs=4
, a gradient descent will be applied 4 times on each batch.
Also, I looked in the source code, steps
attribute from PPOConfig
never seems to be used in PPOTrainer
.
@medric49 yes I'm not sure what the steps attribute does but in any case the number of batches is always fixed and equal (roughly) to len(dataset) // batch_size
. However, as you noted, by increasing the epochs parameter (or maybe the steps parameter) you increase the number of times each batch is used. This does not mean the dataloader will have more batches but each batch is used more times, which is what you want in the end: more gradient step iterations. So you cannot really increase the number of batches but it does not mean that you cannot do more training iterations !
Yes, with the current version, we have a precise number of invariant batches on which we can perform optimizations.
However, it seemed to me that the most popular method consists in generating random batches steps
times from a replay buffer (our dataset here) and that the ppo_epochs
parameter allows to control the number of optimizations done per batch during each training step.
So, in the current implementation, the behavior of the dataloader is different from what I was expecting, but works also very well.
Just that the steps
parameter seems to be useless and the concept of "number of training steps" doesn't not exist anymore.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
My concern is about this example trl/examples/scripts/ppo.py.
In the documentation, it is mentioned that
steps
parameter intrl.PPOConfig
represents the number of training steps, but when I change it, the number of training batches produced bytrainer.dataloader
here does not change (apparently it only depends on the dataset size). What does this parameter really represent, and in my case, how can I increase the number of training batches?