2024-05-15 11:48:42.708 ERROR train - global_exception_handler: Uncaught exception Error(s) in loading state_dict for Sequential:
Missing key(s) in state_dict: "0.weight", "0.bias", "2.weight", "2.bias".
Unexpected key(s) in state_dict: "weight", "bias".
NoneType: None
2024-05-15 11:48:42.708 ERROR train - global_exception_handler: <class 'RuntimeError'>
2024-05-15 11:48:42.708 ERROR train - global_exception_handler: <class 'RuntimeError'>
2024-05-15 11:48:42.709 ERROR train - global_exception_handler:
File "/data/orlando/workspace/AndroidAgentModelZoo/models/LLaVA_forward/llava/train/train_mem.py", line 4, in <module>
train(attn_implementation="flash_attention_2")
File "/data/orlando/workspace/AndroidAgentModelZoo/models/LLaVA_forward/llava/train/train.py", line 1302, in train
model.get_model().initialize_vision_modules(
File "/data/orlando/workspace/AndroidAgentModelZoo/models/LLaVA_forward/llava/model/llava_arch.py", line 97, in initialize_vision_modules
self.mm_projector.load_state_dict(get_w(mm_projector_weights, 'mm_projector'))
File "/usr/local/anaconda3/envs/agentbackend/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
I guessed that may be caused by inconsistency between the model_name_or_path and the referenced model used in projector. However, in the projector's setting, I can only see the model name is ./checkpoints/llama_2/llama-2-7b-chat (https://huggingface.co/liuhaotian/llava-pretrain-llama-2-7b-chat/blob/main/config.json). Could you clarify what llama2 model should I use in --model_name_or_path?
PS: For my understanding, the pertaining phase focuses on language and image alignment (feature alignment) so its goal is to train an appropriate projector to map image into language space. Then with this projector, we can fine-tune both language and image to improve task performance. My guess is meta-llama/Llama-2-7b-chat-hf should be OK (it's the converted format from meta's official release llama2), or according to https://github.com/haotian-liu/LLaVA/blob/main/docs/LLaVA_from_LLaMA2.md, I need to download the latest llama2 checkpoints and use it (I try this, but failed, because this format can't be loaded by huggingface API).
Current follow-up:
Now I'm trying to use meta-llama/Llama-2-7b-chat-hf to pretrain a projector, then follow the fine-tune process.
Could you clarify which language model I should use for llava-pretrain-llama-2-7b-chat/mm_projector.bin? Correct me if there is anything wrong for my description.
Describe the issue
Issue: I try to do visual instruction tuning using the pretrained projector liuhaotian/llava-pretrain-llama-2-7b-chat. However, got the following issue. I have download the projector from https://huggingface.co/liuhaotian/llava-pretrain-llama-2-7b-chat to ./checkpoints/llava-pretrain-llama-2-7b-chat. According to the guide in https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune.sh and https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md, I think I should use meta-llama/Llama-2-7b-chat-hf during fine-tuning. But I got an issue, please check the details in the logging section.
Command:
Log:
I guessed that may be caused by inconsistency between the model_name_or_path and the referenced model used in projector. However, in the projector's setting, I can only see the model name is ./checkpoints/llama_2/llama-2-7b-chat (https://huggingface.co/liuhaotian/llava-pretrain-llama-2-7b-chat/blob/main/config.json). Could you clarify what llama2 model should I use in --model_name_or_path?
PS: For my understanding, the pertaining phase focuses on language and image alignment (feature alignment) so its goal is to train an appropriate projector to map image into language space. Then with this projector, we can fine-tune both language and image to improve task performance. My guess is meta-llama/Llama-2-7b-chat-hf should be OK (it's the converted format from meta's official release llama2), or according to https://github.com/haotian-liu/LLaVA/blob/main/docs/LLaVA_from_LLaMA2.md, I need to download the latest llama2 checkpoints and use it (I try this, but failed, because this format can't be loaded by huggingface API).
Current follow-up: Now I'm trying to use meta-llama/Llama-2-7b-chat-hf to pretrain a projector, then follow the fine-tune process.
Could you clarify which language model I should use for llava-pretrain-llama-2-7b-chat/mm_projector.bin? Correct me if there is anything wrong for my description.
Really appreciate your help
Orlando