Open nicolai256 opened 1 year ago
Thank you for letting me know. It is interesting!
I briefly checked the code and could not figure out how it calculates the total loss. It requires further investigations, and will be some time before I can implement it.
Thank you for letting me know. It is interesting!
I briefly checked the code and could not figure out how it calculates the total loss. It requires further investigations, and will be some time before I can implement it.
I tried looking for it and I found some loss functions in here /ddpo/training/policy_gradient.py
any chance you could implement this? https://github.com/vinhkhuc/ddpo/tree/support_gpu it's for RLHF type of stuff, check the paper could be really interesting for lora and finetuning