openpsi-project / ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Apache License 2.0
82 stars 4 forks source link

Add the GRPO algorithm. #31

Closed garrett4wade closed 2 months ago

garrett4wade commented 2 months ago

Corresponding changes: