currently the code seems to be hardcoded to use only num_rollouts(configured to 1 here) for training, regardless of the number of epochs, as can be seen in the prompt_dataloader creation in PPOTrainer (here). Is there a reason why? Also why is num_rollouts configured to 1? Thanks
hi,now it supports to change the default config in ppo_configs. In the future, the ppo_config will be converted to a yaml file, and users can change the configs in a yaml file
Hi,
currently the code seems to be hardcoded to use only
num_rollouts
(configured to 1 here) for training, regardless of the number of epochs, as can be seen in theprompt_dataloader
creation inPPOTrainer
(here). Is there a reason why? Also why isnum_rollouts
configured to 1? Thanks