llava-rlhf / LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF
https://llava-rlhf.github.io/
GNU General Public License v3.0
315 stars 21 forks source link

how to use the reward model isolatedly? #28

Closed jxgu1016 closed 3 months ago

jxgu1016 commented 6 months ago

I want to use the reward model to calculate reward offline for some QAs, is there any demo code?

Edward-Sun commented 3 months ago

Hey @jxgu1016 , you can refer to this issue, it seems quite straightforward to use the reward model.