openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
https://openreasoner.github.io/
MIT License
1.04k stars 76 forks source link

Will this project support prm training of soft label? #57

Open Dada-Cloudzxy opened 5 days ago

Dada-Cloudzxy commented 5 days ago

OmegaPRM and Math-Shepherd both report that soft label is better? OmegaPRM和Math-Shepherd好像都报告了soft label更好?

yitianlian commented 2 hours ago

I'm also interested in this! From your paper about OpenR, I guess you will label + when the mc_value is larger than 0 (if I understand right), which means that this path can lead to a correct answer. But I don't think it's a nice idea, and also other work[1] uses regression to predict the reward.

  1. Step-level Value Preference Optimization for Mathematical Reasoning