ValueError: Trying to set a tensor of shape torch.Size([128257, 4096]) in "weight" (which has shape torch.Size([128256, 4096])), this look incorrect.

Hello everyone, thank you for the great job!

I am trying to further fine-tune the LLaVA architecture using your implementation with LLaMA 3 Instruct 8B. I can already fine-tune the Vicuna model using the original LLaVA code and now I am looking for some implementation with LLaMA 3.

I found your repo and followed your instructions from the README.md file for each step. I am able to train the model using the following bash file and it looks like it's correctly saved. NOTE: I downloaded the model from your huggingface repo

TRAINING CODE

#!/bin/bash

################## MODELS #################
PROMPT_VERSION="llama3"
MODEL_DIR_PATH="/user/hf_models/"
MODEL_VERSION="LLaVA-Meta-Llama-3-8B-Instruct-FT"
MODEL_ABS_PATH=$MODEL_DIR_PATH/$MODEL_VERSION
################### END ###################

################## CUDA ####################
export CUDA_VISIBLE_DEVICES=0
echo "CUDA IS" ${CUDA_VISIBLE_DEVICES}
################## CUDA ####################

################# TRAINING #################
deepspeed llava/train/train_mem.py \
    --lora_enable True --lora_r 128 --lora_alpha 256\
    --deepspeed ./scripts/zero3.json \
    --model_name_or_path $MODEL_ABS_PATH \
    --version $PROMPT_VERSION \
    --data_path ./data/train.json \
    --image_folder ./data/images \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --bf16 True \
    --output_dir ./checkpoints/llava-$MODEL_VERSION-lora\
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 32 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50000 \
    --save_total_limit 1 \
    --learning_rate 2e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.1 \
    --lr_scheduler_type "linear" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 1024 \
    --gradient_checkpointing False \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to none

I then tried to merge (using this script from LLaVA) the resulting adapters with the original model LLaVA-Meta-Llama-3-8B-Instruct-FT and I got the following error.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading LLaVA from base model...
/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Loading checkpoint shards:   0%|                                                                                                           | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/user/mm-iglu-it/./scripts/merge_lora_weights.py", line 22, in <module>
    merge_lora(args)
  File "/user/mm-iglu-it/./scripts/merge_lora_weights.py", line 8, in merge_lora
    tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, device_map='cpu')
  File "/user/mm-iglu-it/llava/model/builder.py", line 64, in load_pretrained_model
    model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)
  File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3682, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4109, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/transformers/modeling_utils.py", line 887, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/user/anaconda3/envs/mm_iglu_it/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([128257, 4096]) in "weight" (which has shape torch.Size([128256, 4096])), this look incorrect.

Finally, I even tried using the adapters (without merging) with the following script but I get the same identical error. The file llava/eval/test_llava.py is very similar to the inference script from the original LLaVA repo, but I made very little changes for my convenience (such as --prompt-version, --input-file-path, etc.).

TESTING CODE

# !/bin/bash

##################################### MODEL #####################################
PROMPT_VERSION="llama3"
MODEL_NAME="llava-LLaVA-Meta-Llama-3-8B-Instruct-FT-lora"
MODEL_BASE="LLaVA-Meta-Llama-3-8B-Instruct-FT"
################################## CHOOSE CUDA ##################################
export CUDA_VISIBLE_DEVICES=0
echo "CUDA is" ${CUDA_VISIBLE_DEVICES}
###################################### END ######################################

#################################### TESTING ####################################
deepspeed ./llava/eval/test_llava.py \
    --model-path ./checkpoints/$MODEL_NAME \
    --model-base /user/hf_models/$MODEL_BASE \
    --model-name $MODEL_NAME \
    --prompt-version $PROMPT_VERSION \
    --input-file-path ./data/test.json \
    --image-path ./data/images

Do you have any idea what I am doing wrong? I can't find anything online.

mbzuai-oryx / LLaVA-pp

ValueError: Trying to set a tensor of shape torch.Size([128257, 4096]) in "weight" (which has shape torch.Size([128256, 4096])), this look incorrect. #31