llava-rlhf / LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF
https://llava-rlhf.github.io/
GNU General Public License v3.0
315 stars 21 forks source link

Will the RM be released? #14

Closed findalexli closed 1 year ago

Edward-Sun commented 1 year ago

Hi, The RM is released:

https://github.com/llava-rlhf/LLaVA-RLHF/tree/main/RLHF#1-training-the-instruction-following-reward-model

Note: For both 7b and 13b policy models, we use the same 13b reward model. We also provide the pretrained reward model checkpoint at LLaVA-RLHF-13b-v1.5-336/rm_lora_adapter_model. To use the pretrained LoRA checkpoint, the base_model_name_or_path in adapter_config.json need to be modified to the actual path of the SFT model.