Open JumpingRain opened 1 week ago
Qwen2.5-Math and Qwen2.5-Code are two state-of-the-art models that have recently integrated GRPO (Group Relative Policy Optimization)
https://qwenlm.github.io/blog/qwen2.5-math/ https://[arxiv.org/pdf/2402.03300](https://arxiv.org/pdf/2402.03300)
This is a request-only post, so I don't contribute anything to it.
Hello @JumpingRain there is an open PR for this in #1954 that is currently under development
Feature request
Qwen2.5-Math and Qwen2.5-Code are two state-of-the-art models that have recently integrated GRPO (Group Relative Policy Optimization)
Motivation
https://qwenlm.github.io/blog/qwen2.5-math/ https://[arxiv.org/pdf/2402.03300](https://arxiv.org/pdf/2402.03300)
Your contribution
This is a request-only post, so I don't contribute anything to it.