[RLlib] Add RLHF Example

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

https://ray.io

Apache License 2.0

34.21k stars 5.81k forks source link

[RLlib] Add RLHF Example #33021

Open ArturNiederfahrenhorst opened 1 year ago

ArturNiederfahrenhorst commented 1 year ago

Description

Rllib currently lacks an RLHF example.

Reinforcement learning with human feedback (RLHF) is an approach to reinforcement learning that involves incorporating feedback from humans to improve the learning process. This has recently become a hot topic with Chat GPT. We would like to keep up with this trend and are tracking this as a feature request.

Dataset candidate: https://huggingface.co/datasets/Anthropic/hh-rlhf

gjoliver commented 1 year ago

tentative place to hold this e2e example: https://github.com/maxpumperla/chatair

jspisak commented 1 year ago

Will this example be framework agnostic? It would be really cool if we could get something that supports JAX given the rest of the community has rallied around TRLX.

gjoliver commented 1 year ago

Will this example be framework agnostic? It would be really cool if we could get something that supports JAX given the rest of the community has rallied around TRLX.

RLlib doesn't support JAX at this point.

weberxie commented 1 year ago

Hi team, any updates on RLHF Example?

swaroopch commented 1 year ago

Request to please add https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/README.md as an example, similar to https://docs.ray.io/en/master/ray-air/examples/gptj_deepspeed_fine_tuning.html 🙏🏽

ArturNiederfahrenhorst commented 1 year ago

CC @gjoliver @kouroshHakha

Request to please add https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/README.md as an example, similar to https://docs.ray.io/en/master/ray-air/examples/gptj_deepspeed_fine_tuning.html 🙏🏽

treeguard commented 1 year ago

any updates?

wuxibin89 commented 1 year ago

Our project OpenLLMAI/OpenLLaMA2 implements a high performance RLHF framework based on Ray and DeepSpeed. With Ray's great flexibility, we can do 34B LLaMA2 PPO training on a single DGX-A100 node with ZeRO-2, which achieve high speed for text generation when make experience.

For anyone interested, you can get started at: https://github.com/OpenLLMAI/OpenLLaMA2/blob/main/examples/train_ppo_ray.py