how to use the reward model isolatedly?

llava-rlhf / LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF

https://llava-rlhf.github.io/

GNU General Public License v3.0

315 stars 21 forks source link

Closed jxgu1016 closed 3 months ago

jxgu1016 commented 6 months ago

I want to use the reward model to calculate reward offline for some QAs, is there any demo code?

Edward-Sun commented 3 months ago

Hey @jxgu1016 , you can refer to this issue, it seems quite straightforward to use the reward model.