Open ArturNiederfahrenhorst opened 1 year ago
tentative place to hold this e2e example: https://github.com/maxpumperla/chatair
Will this example be framework agnostic? It would be really cool if we could get something that supports JAX given the rest of the community has rallied around TRLX.
Will this example be framework agnostic? It would be really cool if we could get something that supports JAX given the rest of the community has rallied around TRLX.
RLlib doesn't support JAX at this point.
Hi team, any updates on RLHF Example?
Request to please add https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/README.md as an example, similar to https://docs.ray.io/en/master/ray-air/examples/gptj_deepspeed_fine_tuning.html 🙏🏽
CC @gjoliver @kouroshHakha
Request to please add https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/README.md as an example, similar to https://docs.ray.io/en/master/ray-air/examples/gptj_deepspeed_fine_tuning.html 🙏🏽
any updates?
Our project OpenLLMAI/OpenLLaMA2 implements a high performance RLHF framework based on Ray and DeepSpeed. With Ray's great flexibility, we can do 34B LLaMA2 PPO training on a single DGX-A100 node with ZeRO-2, which achieve high speed for text generation when make experience.
For anyone interested, you can get started at: https://github.com/OpenLLMAI/OpenLLaMA2/blob/main/examples/train_ppo_ray.py
Description
Rllib currently lacks an RLHF example.
Reinforcement learning with human feedback (RLHF) is an approach to reinforcement learning that involves incorporating feedback from humans to improve the learning process. This has recently become a hot topic with Chat GPT. We would like to keep up with this trend and are tracking this as a feature request.
Dataset candidate: https://huggingface.co/datasets/Anthropic/hh-rlhf