Steps - Githubissues

l294265421 / alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

MIT License

103 stars 13 forks source link

Hey how are you ? First of all thank you for us to provide this repo. I have same question for steps.

Are we going to choose every step here one by one? Are we going step by step?

Or will we choose one of these steps and test the results accordingly?

Also, I want to design a Chatbot in a ConversationAI style. How should the data be for this? It keeps it as generate as History, but how do we set them in the data? Well, I’m creating them in my mind. Can you help me with this too??

If there is anything I can’t think of or you want to contribute, I would appreciate it if you add it.

Thank you for everthing

l294265421 / alpaca-rlhf

Steps #2