Open SenZHANG-GitHub opened 1 year ago
I have the same concern. The same issue holds when initializing the critic. I guess we should use ds_config
for the critic model and ds_eval_config
for the reward model.
I have the same confusion. Is there any progress on this issue?
when initializing reward and ref models in step 3 of deepspeed-chat, there are two kinds of deepspeed config files are used, i.e. ds_config and ds_eval_config. May I ask why we need to use two configs here and any suggestions on safely removing ds_eval_config? e.g.,