Open noanti opened 1 year ago
Sure it can be done. I've completed this and it can run with the rlhf process.
EDIT: yea I'm calling BS. I had a go and numerous reason why subclassing and monkeypatching huggingface implementation doesnt work. Have to rip out the model arch and manually displace the PaLM arch. Forward pass and generation in this repo is customized that subclassing and monkey patching dont work.
Unfortunately then ripping out the model arch makes the weight loading kind of janky. Usability definitely suffers. A bit confused why @lucidrains didn't just build from hf models like everyone else
EDIT 2: For anyone coming later and is on HF stack (everyone), follow on here: https://huggingface.co/blog/stackllama. Save your time
just like bloom or t5?