mengdi-li / awesome-RLAIF

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
Apache License 2.0
102 stars 4 forks source link
alignment llms rl rlaif rlhf

Awesome RLAIF

Awesome https://github.com/mengdi-li/awesome-RLAIF/blob/main/LICENSE

An actively updated list of literature on Reinforcement Learning from AI Feedback (RLAIF).

What is RLAIF? ChatGPT:

Reinforcement Learning from AI Feedback (RLAIF) is a concept that describes a type of machine learning approach where an AI agent learns by receiving feedback or guidance from another AI system. This concept is closely related to the field of Reinforcement Learning (RL), which is a type of machine learning where an agent learns to make a sequence of decisions in an environment to maximize a cumulative reward.

In traditional RL, an agent interacts with an environment and receives feedback in the form of rewards or penalties based on the actions it takes. It learns to improve its decision-making over time to achieve its goals. In the context of Reinforcement Learning from AI Feedback, the AI agent still aims to learn optimal behavior through interactions, but the feedback comes from another AI system rather than from the environment or human evaluators. This can be particularly useful in situations where it may be challenging to define clear reward functions or when it is more efficient to use another AI system to provide guidance. The feedback from the AI system can take various forms, such as:

This approach is often used in scenarios where the RL agent needs to learn from limited human or expert feedback or when the reward signal from the environment is sparse or unclear. It can also be used to accelerate the learning process and make RL more sample-efficient. Reinforcement Learning from AI Feedback is an area of ongoing research and has applications in various domains, including robotics, autonomous vehicles, and game playing, among others.

Note

Some of the papers are not exact RLAIF methodologies but closely related, for example, some of them use SFT instead of RL for parameter tuning based on AI feedback/generations, MPC instead of RL policies for robotic control. Since RLAIF research is still in its early stage, we think that these relevant papers should be of benefit to the community, so we have included them in this reading list. Careful tags have been added to facilitate paper search.

Papers

format:
- [title](paper link) | ![](https://img.shields.io/badge/CONFERENCE_'YEAR-blue)
  - Authors: ...
  - <details> <summary>Abstract (click me)</summary> ... </details>
  - Links: [Project website](website link), [Code](code link), [Dataset](dataset link), ...
  - Tags: ...

2024

2023

2022

Related Blogs

Related Awesome Repos

Contributing

Let's make the list more comprehensive.

👥 Contributors