Open lucasjinreal opened 8 months ago
The model you mentioned with a single visual path is exactly LLaVA-1.5. We have conduct extensive comparisons in Tab 1 of our paper. Check out our paper.
I notcied that you enlarge the size in llava-1.5 are using interpolate positional embedding after calculate position_ids.
This would notiablly drop performance as model haven't seen large sizes when training.
What I mean is that, have u did experiment on enlarge input size by interpolate position embedding weight, and then train it along with vision encoder or full model.
How do u think the differences of these two ways.
(Your interpolate embedding seems not trainable parameters. I didn't see a resize_position_embedding before training here but just interpolate after calculated position_ids)
I get it, maybe your mentioned way is better. Let me try it.
Nice, let me know the differences between them after you tried.
@luogen1996 Hello, I am doing sft stage2 follow your code, using zero3 finetune, got some warnings:
- vision_model.head.layernorm.bias: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated
- vision_model.head.layernorm.weight: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated
- vision_model.head.mlp.fc1.bias: found shape torch.Size([4304]) in the checkpoint and torch.Size([0]) in the model instantiated
- vision_model.head.mlp.fc1.weight: found shape torch.Size([4304, 1152]) in the checkpoint and torch.Size([0]) in the model instantiated
- vision_model.head.mlp.fc2.bias: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated
- vision_model.head.mlp.fc2.weight: found shape torch.Size([1152, 4304]) in the checkpoint and torch.Size([0]) in the model instantiated
- vision_model.head.probe: found shape torch.Size([1, 1, 1152]) in the checkpoint and torch.Size([0]) in the model instantiated
- vision_model.post_layernorm.bias: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated
- vision_model.post_layernorm.weight: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated
there was some reports: https://github.com/microsoft/DeepSpeed/issues/3574 indicates it related to enabling gradient_checkpoing and zero3 at the same time.
Does this effect model training? It looks like loss are normal
I don't see these warnings in my logs. Weights you print are not used in model, so maybe you can directly ignore them.
The model you mentioned with a single visual path is exactly LLaVA-1.5. We have conduct extensive comparisons in Tab 1 of our paper. Check out our paper.
In table 1, are llava 1.5 trained on different resolutions? or only eval in different resolutions?
The model you mentioned with a single visual path is exactly LLaVA-1.5. We have conduct extensive comparisons in Tab 1 of our paper. Check out our paper.
In table 1, are llava 1.5 trained on different resolutions? or only eval in different resolutions?
@luogen1996 Thanks~
The model you mentioned with a single visual path is exactly LLaVA-1.5. We have conduct extensive comparisons in Tab 1 of our paper. Check out our paper.
In table 1, are llava 1.5 trained on different resolutions? or only eval in different resolutions?
Yes, we train llava 1.5 on different resolution. The training settings are the same as llava-hr, which includes low-resolution pre-training and high-resolution instruction tuning.
@luogen1996 Hello, I am doing sft stage2 follow your code, using zero3 finetune, got some warnings:
- vision_model.head.layernorm.bias: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated - vision_model.head.layernorm.weight: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated - vision_model.head.mlp.fc1.bias: found shape torch.Size([4304]) in the checkpoint and torch.Size([0]) in the model instantiated - vision_model.head.mlp.fc1.weight: found shape torch.Size([4304, 1152]) in the checkpoint and torch.Size([0]) in the model instantiated - vision_model.head.mlp.fc2.bias: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated - vision_model.head.mlp.fc2.weight: found shape torch.Size([1152, 4304]) in the checkpoint and torch.Size([0]) in the model instantiated - vision_model.head.probe: found shape torch.Size([1, 1, 1152]) in the checkpoint and torch.Size([0]) in the model instantiated - vision_model.post_layernorm.bias: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated - vision_model.post_layernorm.weight: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model instantiated
there was some reports: microsoft/DeepSpeed#3574 indicates it related to enabling gradient_checkpoing and zero3 at the same time.
Does this effect model training? It looks like loss are normal
Hello, I also meet this problem.. Could you please share how you resolved it?
What if didn;t use convnext added vision encoder?