llava-rlhf / LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF
https://llava-rlhf.github.io/
GNU General Public License v3.0
316 stars 24 forks source link

reward base model missing #27

Closed Ritz111 closed 1 month ago

Ritz111 commented 6 months ago
image

Hi, When I loaded the pretrained reward model, I found that the base model "llava-vicuna-v1-5-13b-336-finetune-final-padding" can not be reached. Is it uploaded on https://huggingface.co/? Can you offer a link to download it? thanks

Ritz111 commented 6 months ago

As I used "LLaVA-RLHF-13b-v1.5-336/sft_model" instead of "llava-vicuna-v1-5-13b-336-finetune-final-padding" to initialize the reward model, an error occurred. The error info is:

  File "/home/.local/lib/python3.9/site-packages/transformers/generation/utils.py", line 2678, in sample
    respond_outputs = unwrapped_policy.respond(
  File "LLaVA-RLHF-main/RLHF/models/rl_models.py", line 339, in respond
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Is this error raised because of the mistake in initializing the reward model? The training codes and datasets are as same as your original code.

Edward-Sun commented 6 months ago

Hi @Ritz111, Did you installed the llava model as in https://github.com/llava-rlhf/LLaVA-RLHF/tree/main/llava_setup?

Ritz111 commented 6 months ago

Hi @Ritz111, Did you installed the llava model as in https://github.com/llava-rlhf/LLaVA-RLHF/tree/main/llava_setup?

Sure, otherwise I cannot run the repo.

luckyyangrun commented 2 months ago

Hi @Edward-Sun, is there any update? Does "llava-vicuna-v1-5-13b-336-finetune-final-padding" equal to "LLaVA-RLHF-13b-v1.5-336/sft_model"?

Edward-Sun commented 1 month ago

Yeah "llava-vicuna-v1-5-13b-336-finetune-final-padding" is the same as "LLaVA-RLHF-13b-v1.5-336/sft_model" 👍