l294265421 / alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
https://88aeeb3aef5040507e.gradio.live/
MIT License
103 stars 13 forks source link

Steps #2

Open syngokhan opened 1 year ago

syngokhan commented 1 year ago

Hey how are you ? First of all thank you for us to provide this repo. I have same question for steps.

Are we going to choose every step here one by one? Are we going step by step?

Or will we choose one of these steps and test the results accordingly?

Also, I want to design a Chatbot in a ConversationAI style. How should the data be for this? It keeps it as generate as History, but how do we set them in the data? Well, I’m creating them in my mind. Can you help me with this too??

If there is anything I can’t think of or you want to contribute, I would appreciate it if you add it.

Thank you for everthing

l294265421 commented 1 year ago
  1. The step 1 and step 2 can be run and tested alone. But the step 3 depends on the step 1 and step 2.
  2. The chatbot in this repo (alpaca_rlhf/inference/llama_chatbot_gradio.py) supports conversation (multi-turn dialogue). That is, before clicking the button Clear History, the history including the responses that the chatbot generates and the users speak, will be used as input for the next turn. And the training data (https://huggingface.co/datasets/Dahoas/rm-static) also includes multi-turn dialogue data.