issues
search
liziniu
/
ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
151
stars
13
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Mistral as the backbone of reward model
#3
renatz
closed
8 months ago
5
Repoducing llama-2-7b results
#2
EsYoon7
closed
8 months ago
4
Bugs when using zero-stage3
#1
George-Chia
closed
12 months ago
0