Closed HarrySSH closed 1 year ago
Hi,
The problem seems to be a mismatch between the clip model configuration and checkpoint. Which script are you using?
Thanks for your help!
I am using
train_rl_model.sh
Did you run the first two steps of RLHF, i.e., reward model training and RL model initialization?
Did you run the first two steps of RLHF, i.e., reward model training and RL model initialization?
Yes! That's the reason I felt quite confused. :) the reward model is in the 13b and the model init is in the 7b And the final train_rl_model.sh is in the 7b folder
Note: For both 7b and 13b policy models, we use the same 13b reward model.
If you have done the 7b initialization, I will double-check the script.
And for your information. I tried to change the position embedding directly on the embedding features on the dimension one, and meaning performance position embedding directly based on the dimension of the embedding instead of the inputs_ids. And then I get this error "embeddings = embeddings + position_ids RuntimeError: The size of tensor a (1024) must match the size of tensor b (577) at non-singleton dimension 2"
Not sure if this is an useful information though
Note: For both 7b and 13b policy models, we use the same 13b reward model.
If you have done the 7b initialization, I will double-check the script.
Yes. You made that very clear on the readMe :) I think I followed that part
RuntimeError: The size of tensor a (577) must match the size of tensor b (257) at non-singleton dimension 1
I think you don't need to change the code. 577 = 24^2 + 1, and 257 = 16^2 +1.
clip-vit-large-patch14-336 used 336 / 14 = 24 patches for each dimension, while clip-vit-large-patch14 used 224 / 14 = 16. So the problem is the loaded checkpoint is not the same as the one in the configuration.
ah thank you so much!
I found the issue. In the script 7b initialization, it used the openai/clip-vit-large-patch14,
but the reward function in the 13b use VISION_TOWER=openai/clip-vit-large-patch14-336
I guess that's where the bug happened
Hi, the use of different vision modules is by design. I will rerun the script and see whether I can reproduce the bug. Thank you for your patience.
Ah ! I am not saying what you wrote is incorrect, I just mean there are inconstant to each other and that's totally me misinterpreted your codes! I am retraining the reward function with the other clip model and I will let you know if my issue is resolved.
Hi Harry,
I have checked the code and the issue seems that in the cleaned script, the wrong base models are used for the policy and the reward model. The correct version should be:
POLICY_BASE_MODEL_NAME=LLaVA-RLHF-7b-v1.5-224/sft_model
RM_BASE_MODEL_NAME=LLaVA-RLHF-13b-v1.5-336/sft_model
The script is updated.
Also, one issue I found while re-running our public codebase is that transformers==4.31.0
should be used instead of transformers==4.33.0
, otherwise, it would cause a length mismatch issue due to the inconsistency of the tokenizer implementation in the transformers package.
Sorry for the inconvenience.
Thanks for the previous explanation. I encountered another bug here.
_File "/.../anaconda3/envs/rlhf/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 200, in forward embeddings = embeddings + self.position_embedding(self.positionids) RuntimeError: The size of tensor a (577) must match the size of tensor b (257) at non-singleton dimension 1
I used the standard dataset, and model to train the RL pipeline, just to make sure I can get it running. what do you think can be the issue to cause this bug?
Many thanks