l294265421 / alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
https://88aeeb3aef5040507e.gradio.live/
MIT License
103 stars 13 forks source link
alpaca chatgpt language-model large-language-models llama llm reinforcement-learning rlhf

alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback).

Online Demo

Modifications on DeepSpeed Chat

Step 1

Step 2

Step 3

Stey by Step

Comparison between SFT and RLHF

References

Articles

Sources

Tools

Datasets

Related Repositories