Open sglucas opened 1 year ago
Not sure what you mean. InstructGPT's architecture is the same as GPT-3. InstructGPT is just fine-tuned using RLHF.
If you're asking if I plan to implement the reward model and policy used to fine-tune InstructGPT, I will not be implementing that.
My goal with this repo is to provide a simple, readable, and hackable GPT implementation for educational purposes. RLHF is definitely outside the scope of those criterion.
Hi, very nice repo.
May I ask do you plan to reproduce ChatGPT/InstructGPT or GPT with RLHF based on JAX?
Best