Hi！About PPO baseline weight diff

mnoukhov / elastic-reset

Code and Experiments for "Language Model Alignment with Elastic Reset" (NeurIPS 2023)

Apache License 2.0

5 stars 0 forks source link

Hi！About PPO baseline weight diff #1

Open wangskyGit opened 7 months ago

wangskyGit commented 7 months ago

Hi! This is nice work and it's easy but effective. I am wondering if you could open-source the PPO baseline model as well. I hope I can reproduce the results from Table 3 in the paper. It would be very nice of you If you could upload the ppo baseline model so that we won't need to rerun PPO again. Thanks a lot.

mnoukhov commented 7 months ago

I uploaded llama-se-rl-peft to huggingface, is that what you're looking for? see https://github.com/mnoukhov/elastic-reset/tree/main/stackllama

NEUBuffett commented 7 months ago

Hello, I would like to ask about the introduction of the model you gave. Is the llama-se-rl-peft model you provided a model trained with elastic for PPO? or without elastic? Can you share the baseline model of PPO and the model trained with elastic?