mindspore-lab / mindrlhf

Apache License 2.0
26 stars 12 forks source link

single prompt being used in training #52

Closed kfertakis closed 1 day ago

kfertakis commented 10 months ago

Hi,

currently the code seems to be hardcoded to use only num_rollouts(configured to 1 here) for training, regardless of the number of epochs, as can be seen in the prompt_dataloader creation in PPOTrainer (here). Is there a reason why? Also why is num_rollouts configured to 1? Thanks

ChessQian commented 9 months ago

hi,now it supports to change the default config in ppo_configs. In the future, the ppo_config will be converted to a yaml file, and users can change the configs in a yaml file