voidful / TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
MIT License
539 stars 60 forks source link

Documentation on Methodology #18

Closed flyingabove closed 1 year ago

flyingabove commented 1 year ago

Hi,

Great Repo!

I see that there are some papers listed in the comments. Do you think you could give us quick guide or list of papers corresponding to the techniques you have implemented?

Thanks!

voidful commented 1 year ago

Hi,

Thank you for your interest in the TextRL library! While I can't provide specific papers corresponding to the implementation, as TextRL is a composition of multiple techniques and ideas, I can provide you with a list of papers that are related to the general concepts of text generation, reinforcement learning, and fine-tuning pre-trained language models. These papers might help you understand the techniques and motivation behind TextRL.

https://rl4lms.apps.allenai.org https://github.com/anthropics/hh-rlhf/tree/master

If you're looking for more specific details, you can check out the documentation and source code of the libraries that TextRL builds upon, such as Hugging Face's Transformers, PFRL, and OpenAI GYM.