openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
https://openreasoner.github.io/
MIT License
1.07k stars 79 forks source link

How can I change the data format to preprocess the data generated by data/omegaPRM_v2 #46

Closed FanqingM closed 4 days ago

FanqingM commented 2 weeks ago

System Info

Same with this repo.

Who can help?

@mengfn, @gzqaq, @Gebro13, @YanSong97, @iamlilAJ

Information

Tasks

Reproduction

First I generated data with data/omegaPRM_v2 as below: "question": "In three-dimensional space, find the number of lattice points that have a distance of 3 from the origin.\n\nNote: A point is a lattice point if all its coordinates are integers.", "final_answer": "30", "reasoning_steps": [ { "solution_prefix": "xx", "mc_value": 0.5 }, { "solution_prefix": "xx", "mc_value": 1.0 }...

Next, I want to train the PRM model use prm/code/finetune_qwen.py. It seems that I should preprocess the data above. I notice in preprocess, and the author give an example in tests/samples_math-aps.json as below: [{"q": "Find the sum of all integers $k$ such that $\binom{23}{4} + \binom{23}{5} = \binom{24}{k}$.", "states": [{"rollout": " To solve the equation \(\binom{23}{4} + \binom{23}{5} = \binom{24}{k}\), we start by usin g the identity for binomial coefficients that states \(\binom{n}{r} + \binom{n}{r+1} = \binom{n+1}{r+1}\). Here, \(n = 23\) and \(r = 4\), so we have:\n\n\[\n\binom{23}{4} + \binom{23}{5} = \binom{24}{5}\n\]\n\ nThis means that \(\binom{24}{k} = \binom{24}{5}\). The binomial coefficient \(\binom{24}{k}\) is symmetric, i.e., \(\binom{24}{k} = \binom{24}{24-k}\).", "state": "", "mcs": 0.6307692307692307},....

It seems that it is the input of preprocess/src/preprocessors/math_aps.py.

But there exists some difference with the data generated by OmegaPRMv2 and the data for preprocess.

I wonder how can i convert the data generated by OmegaPRMv2 to preprocess it for PRM training?

Expected behavior

I wonder how can i convert the data generated by OmegaPRMv2 to preprocess it for PRM training?

FanqingM commented 2 weeks ago

On the other hand, if I choose save_tree, I get that from OmegaPRMv2: "question": "In rectangle $ABCD$, $AB=100$. Let $E$ be the midpoint of $\overline{AD}$. Given that line $AC$ and line $BE$ are perpendicular, find the greatest integer less than $AD$.\n", "final_answer": "141", "reasoning_steps": { "text": "In rectangle $ABCD$, $AB=100$. Let $E$ be the midpoint of $\overline{AD}$. Given that line $AC$ and line $BE$ are perpendicular, find the greatest integer less than $AD$.", "mc_value": 0.9375, "children": [ { "text": "xxx", "mc_value": 1.0, "children": [] },

I wonder how to convert it to adapt to the input data for preprocess

FanqingM commented 2 weeks ago

It seems that the input dataset for prm training(Math-PSA) is openr/prm/code/test.json How can I get this data? It seems that I generated data from OmePRMV2 is not like this

iamlilAJ commented 2 weeks ago

For data generated by OmegaPRM_v2, two formats are available:

  1. Flat Format (save_data_tree=False): Each entry is structured as:

    { "solution_prefix": [Q, x_1:x_i], "mc_value": 0.5 }

    where i is a variable representing the number of reasoning steps. This format provides a linear view of the reasoning process without hierarchical structure.

  2. Tree Format (save_data_tree=True): In this format, data is organized as a tree structure, aligned with the figure presented in the paper. Each reasoning step (or node) includes:

    • text: The cumulative reasoning from the root node up to this specific step.
    • mc_value: The Monte Carlo score computed for the reasoning progression up to this step.
    • children: A list of child nodes branching from the current node.

With these two formats, you should have the flexibility to preprocess the data in ways that best suit your custom training needs.

Screenshot 2024-11-08 at 17 50 57
gzqaq commented 2 weeks ago

Please wait for us to add support for preprocessing data generated by OmegaPRM-v2. Progress is tracked in #47.

gzqaq commented 1 week ago

Reopen because #47 was accidentally merged. Progress is tracked in the new PR #52.