lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
MIT License
7.71k stars 669 forks source link

Is it possible to replace PaLM with other huggingface pretrained language model? #24

Open noanti opened 1 year ago

noanti commented 1 year ago

just like bloom or t5?

kisseternity commented 1 year ago

Sure it can be done. I've completed this and it can run with the rlhf process.

kungfu-eric commented 1 year ago

EDIT: yea I'm calling BS. I had a go and numerous reason why subclassing and monkeypatching huggingface implementation doesnt work. Have to rip out the model arch and manually displace the PaLM arch. Forward pass and generation in this repo is customized that subclassing and monkey patching dont work.

Unfortunately then ripping out the model arch makes the weight loading kind of janky. Usability definitely suffers. A bit confused why @lucidrains didn't just build from hf models like everyone else

EDIT 2: For anyone coming later and is on HF stack (everyone), follow on here: https://huggingface.co/blog/stackllama. Save your time