texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
435 stars 87 forks source link

Loading failed when using Mistral model #118

Closed sunxiaojie99 closed 2 months ago

sunxiaojie99 commented 2 months ago

Hi! When I use Mistral-7B-Instruct-v0-1 as the base_model, and run repllama, following training with Lora, I met some errors like: "size mismatch for base_model.model.model.layers.29.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, while the shape in the current model is torch.Size([14336, 8])."

This error happened at this line: lora_model = PeftModel.from_pretrained(base_model, lora_name_or_path, config=lora_config).

My Transformers version is transformers==4.38.0, since 4.33.0 doesn't support Mistral.

My commands are:

deepspeed --include localhost:0 --master_port 60000 --module tevatron.retriever.driver.train \
  --deepspeed deepspeed/ds_zero3_config.json \
  --output_dir $model_save_path \
  --model_name_or_path $model_path \
  --dataset_path $train_path \
  --save_steps 200 \
  --per_device_train_batch_size 8 \
  --gradient_accumulation_steps 4 \
  --gradient_checkpointing \
  --train_group_size 16 \
  --dataloader_num_workers 1 \
  --learning_rate 1e-4 \
  --bf16 \
  --query_max_len $q_max_len \
  --passage_max_len $p_max_len \
  --num_train_epochs 1 \
  --logging_steps 10 \
  --pooling $pooling \
  --warmup_steps 100 \
  --lora \
  --lora_target_modules q_proj,k_proj,v_proj,o_proj,down_proj,up_proj,gate_proj \
  --overwrite_output_dir
CUDA_VISIBLE_DEVICES=0 python -m tevatron.retriever.driver.encode_cot \
  --output_dir=temp \
  --model_name_or_path $model_path \
  --lora_name_or_path $model_save_path \
  --query_prefix "" \
  --passage_prefix "" \
  --pooling $pooling \
  --normalize \
  --encode_is_query \
  --fp16 \
  --per_device_eval_batch_size 64 \
  --query_max_len $q_max_len \
  --passage_max_len $p_max_len \
  --dataset_path $dev_query_path \
  --encode_output_path $encode_path/dev_query_emb.pkl
aken12 commented 1 month ago

hi, @sunxiaojie99 I faced the same problem (using Mistral-7B model). If you have solved this problem, how did you solve it?

sunxiaojie99 commented 1 month ago

hi, @sunxiaojie99 I faced the same problem (using Mistral-7B model). If you have solved this problem, how did you solve it?

Hi~ Yes, I have solved this problem. In my case, after many attempts, I found that if I delete the "safetensors" file in the output directory, the model will load successfully.

MXueguang commented 1 month ago

Hi sorry for the late reply. Thanks to the latest PR from @ArvinZhuang the safetensor issue should be fixed. Feel free to follow up if there is still error with this.

aken12 commented 1 month ago

I delete the "safetensors" file in the output directory, the model will load successfully.

Thank you for your support. It worked well for my issue.

And I will also try the revised code :)