Will the RM be released?

Hi, The RM is released:

https://github.com/llava-rlhf/LLaVA-RLHF/tree/main/RLHF#1-training-the-instruction-following-reward-model

Note: For both 7b and 13b policy models, we use the same 13b reward model. We also provide the pretrained reward model checkpoint at LLaVA-RLHF-13b-v1.5-336/rm_lora_adapter_model. To use the pretrained LoRA checkpoint, the base_model_name_or_path in adapter_config.json need to be modified to the actual path of the SFT model.

llava-rlhf / LLaVA-RLHF

Will the RM be released? #14