Open wangskyGit opened 7 months ago
I uploaded llama-se-rl-peft
to huggingface, is that what you're looking for?
see https://github.com/mnoukhov/elastic-reset/tree/main/stackllama
Hello, I would like to ask about the introduction of the model you gave. Is the llama-se-rl-peft model you provided a model trained with elastic for PPO? or without elastic? Can you share the baseline model of PPO and the model trained with elastic?
Hi! This is nice work and it's easy but effective. I am wondering if you could open-source the PPO baseline model as well. I hope I can reproduce the results from Table 3 in the paper. It would be very nice of you If you could upload the ppo baseline model so that we won't need to rerun PPO again. Thanks a lot.