rl-stablediffusion training

kohya-ss / sd-scripts

Apache License 2.0

4.54k stars 771 forks source link

rl-stablediffusion training #575

Open nicolai256 opened 1 year ago

nicolai256 commented 1 year ago

any chance you could implement this? https://github.com/vinhkhuc/ddpo/tree/support_gpu it's for RLHF type of stuff, check the paper could be really interesting for lora and finetuning

kohya-ss commented 1 year ago

Thank you for letting me know. It is interesting!

I briefly checked the code and could not figure out how it calculates the total loss. It requires further investigations, and will be some time before I can implement it.

nicolai256 commented 1 year ago

Thank you for letting me know. It is interesting!

I briefly checked the code and could not figure out how it calculates the total loss. It requires further investigations, and will be some time before I can implement it.

I tried looking for it and I found some loss functions in here /ddpo/training/policy_gradient.py