sunzeyeah / RLHF

Implementation of Chinese ChatGPT
282 stars 36 forks source link

有对比不加RLHF和加入RLHF的效果吗 #4

Closed macheng6 closed 1 year ago

macheng6 commented 1 year ago

如题。

sunzeyeah commented 1 year ago

你好,目前RLHF部分还在调试和优化。因为需要同时加载sft和reward模型,计算资源消耗较大,而且RL训练的收敛稳定性不好保证