lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
MIT License
7.67k stars 668 forks source link

Bug fix: Correct function call in RewardModel->finetune_parameters #10

Closed QasimWani closed 1 year ago

QasimWani commented 1 year ago

seems like it's missing self call for finetune_parameters https://github.com/lucidrains/PaLM-rlhf-pytorch/blob/795da603f5bde77d028ad05f7d8172189bfb7a2a/palm_rlhf_pytorch/palm_rlhf_pytorch.py#L529

lucidrains commented 1 year ago

@QasimWani yes indeed, thank you for the pull request!

i've fixed it in https://github.com/lucidrains/PaLM-rlhf-pytorch/commit/bb9d4eb59762bbf07783c3e89d148068e4a762e5 (also took the opportunity to address another issue, which is why i didn't merge your PR, which is fine btw!)

QasimWani commented 1 year ago

awesome! in case you're curious, i was able to find that bug by generating a graphical representation of your code using something I built over the past week. https://www.gctpy.com/graph/79f3f26b86b8ac37350d83307d8ad587d575a03a072b0fa7a77174d371772abf It helped me understand parts of your code faster. image

I've open-sourced the repo: https://github.com/QasimWani/gct