Closed tamanna-mostafa closed 9 months ago
Hi @tamanna-mostafa, thanks for raising an issue!
Based on the error message, it looks as though the weights for the peft file are corrupted or possibly empty. Outside of the script, are you able to run the following:
from peft import PeftModel
model = PeftModel.from_pretrained("/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k")
If you look at the size of the files and shards for /mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k
, what do you see?
Note: your script won't run because get_args
doesn't return anything. Also, you don't need to pass sharing or safe serialization paramters when saving the tokenizer.
@amyeroberts Thanks for your comments. When I run the code you suggested, I get this error:
Traceback (most recent call last):
File "/mnt/efs/data/tammosta/files_t/debig_amy.py", line 4, in <module>
model = PeftModel.from_pretrained(model_id)
TypeError: PeftModel.from_pretrained() missing 1 required positional argument: 'model_id'
If you look at the size of the files and shards for /mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k, what do you see?
(ml_v4) ubuntu@ip-172-31-8-218:/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k$ ls -lh *
-rw-rw-r-- 1 ubuntu ubuntu 5.1K Jan 24 18:13 README.md
-rw-rw-r-- 1 ubuntu ubuntu 676 Jan 24 18:13 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 48 Jan 29 19:19 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 133 Jan 24 18:13 added_tokens.json
-rw-rw-r-- 1 ubuntu ubuntu 14 Jan 24 18:14 latest
-rw-rw-r-- 1 ubuntu ubuntu 829 Jan 24 18:13 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1.8M Jan 24 18:13 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 482K Jan 24 18:13 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 1.9K Jan 24 18:13 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 5.7K Jan 24 18:13 training_args.bin
-rwxrw-r-- 1 ubuntu ubuntu 24K Jan 24 18:14 zero_to_fp32.py
checkpoint-100:
total 2.4M
-rw-rw-r-- 1 ubuntu ubuntu 5.1K Jan 23 21:32 README.md
-rw-rw-r-- 1 ubuntu ubuntu 676 Jan 23 21:32 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 48 Jan 23 21:32 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 133 Jan 23 21:32 added_tokens.json
drwxrwxr-x 2 ubuntu ubuntu 6.0K Jan 23 21:33 global_step100
-rw-rw-r-- 1 ubuntu ubuntu 14 Jan 23 21:34 latest
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 23 21:34 rng_state_0.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 23 21:34 rng_state_1.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 23 21:34 rng_state_2.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 23 21:34 rng_state_3.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 23 21:34 rng_state_4.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 23 21:34 rng_state_5.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 23 21:34 rng_state_6.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 23 21:34 rng_state_7.pth
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 23 21:34 scheduler.pt
-rw-rw-r-- 1 ubuntu ubuntu 829 Jan 23 21:32 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1.8M Jan 23 21:32 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 482K Jan 23 21:32 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 1.9K Jan 23 21:32 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 4.4K Jan 23 21:34 trainer_state.json
-rw-rw-r-- 1 ubuntu ubuntu 5.7K Jan 23 21:32 training_args.bin
-rwxrw-r-- 1 ubuntu ubuntu 24K Jan 23 21:34 zero_to_fp32.py
checkpoint-200:
total 2.4M
-rw-rw-r-- 1 ubuntu ubuntu 5.1K Jan 24 00:47 README.md
-rw-rw-r-- 1 ubuntu ubuntu 676 Jan 24 00:47 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 48 Jan 24 00:47 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 133 Jan 24 00:47 added_tokens.json
drwxrwxr-x 2 ubuntu ubuntu 6.0K Jan 24 00:48 global_step200
-rw-rw-r-- 1 ubuntu ubuntu 14 Jan 24 00:49 latest
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 00:49 rng_state_0.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 00:49 rng_state_1.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 00:49 rng_state_2.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 00:49 rng_state_3.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 00:49 rng_state_4.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 00:49 rng_state_5.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 00:49 rng_state_6.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 00:49 rng_state_7.pth
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 24 00:49 scheduler.pt
-rw-rw-r-- 1 ubuntu ubuntu 829 Jan 24 00:47 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1.8M Jan 24 00:47 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 482K Jan 24 00:47 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 1.9K Jan 24 00:47 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 7.9K Jan 24 00:49 trainer_state.json
-rw-rw-r-- 1 ubuntu ubuntu 5.7K Jan 24 00:47 training_args.bin
-rwxrw-r-- 1 ubuntu ubuntu 24K Jan 24 00:49 zero_to_fp32.py
checkpoint-300:
total 2.4M
-rw-rw-r-- 1 ubuntu ubuntu 5.1K Jan 24 04:02 README.md
-rw-rw-r-- 1 ubuntu ubuntu 676 Jan 24 04:02 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 48 Jan 24 04:02 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 133 Jan 24 04:02 added_tokens.json
drwxrwxr-x 2 ubuntu ubuntu 6.0K Jan 24 04:03 global_step300
-rw-rw-r-- 1 ubuntu ubuntu 14 Jan 24 04:04 latest
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 04:04 rng_state_0.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 04:04 rng_state_1.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 04:04 rng_state_2.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 04:04 rng_state_3.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 04:04 rng_state_4.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 04:04 rng_state_5.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 04:04 rng_state_6.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 04:04 rng_state_7.pth
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 24 04:04 scheduler.pt
-rw-rw-r-- 1 ubuntu ubuntu 829 Jan 24 04:02 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1.8M Jan 24 04:02 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 482K Jan 24 04:02 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 1.9K Jan 24 04:02 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 12K Jan 24 04:04 trainer_state.json
-rw-rw-r-- 1 ubuntu ubuntu 5.7K Jan 24 04:02 training_args.bin
-rwxrw-r-- 1 ubuntu ubuntu 24K Jan 24 04:04 zero_to_fp32.py
checkpoint-400:
total 2.4M
-rw-rw-r-- 1 ubuntu ubuntu 5.1K Jan 24 07:17 README.md
-rw-rw-r-- 1 ubuntu ubuntu 676 Jan 24 07:17 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 48 Jan 24 07:17 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 133 Jan 24 07:17 added_tokens.json
drwxrwxr-x 2 ubuntu ubuntu 6.0K Jan 24 07:18 global_step400
-rw-rw-r-- 1 ubuntu ubuntu 14 Jan 24 07:19 latest
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 07:19 rng_state_0.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 07:19 rng_state_1.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 07:19 rng_state_2.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 07:19 rng_state_3.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 07:19 rng_state_4.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 07:19 rng_state_5.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 07:19 rng_state_6.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 07:19 rng_state_7.pth
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 24 07:19 scheduler.pt
-rw-rw-r-- 1 ubuntu ubuntu 829 Jan 24 07:17 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1.8M Jan 24 07:17 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 482K Jan 24 07:17 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 1.9K Jan 24 07:17 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 15K Jan 24 07:19 trainer_state.json
-rw-rw-r-- 1 ubuntu ubuntu 5.7K Jan 24 07:17 training_args.bin
-rwxrw-r-- 1 ubuntu ubuntu 24K Jan 24 07:19 zero_to_fp32.py
checkpoint-500:
total 2.5M
-rw-rw-r-- 1 ubuntu ubuntu 5.1K Jan 24 10:32 README.md
-rw-rw-r-- 1 ubuntu ubuntu 676 Jan 24 10:32 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 48 Jan 24 10:32 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 133 Jan 24 10:32 added_tokens.json
drwxrwxr-x 2 ubuntu ubuntu 6.0K Jan 24 10:33 global_step500
-rw-rw-r-- 1 ubuntu ubuntu 14 Jan 24 10:34 latest
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 10:34 rng_state_0.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 10:34 rng_state_1.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 10:34 rng_state_2.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 10:34 rng_state_3.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 10:34 rng_state_4.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 10:34 rng_state_5.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 10:34 rng_state_6.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 10:34 rng_state_7.pth
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 24 10:34 scheduler.pt
-rw-rw-r-- 1 ubuntu ubuntu 829 Jan 24 10:32 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1.8M Jan 24 10:32 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 482K Jan 24 10:32 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 1.9K Jan 24 10:32 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 19K Jan 24 10:34 trainer_state.json
-rw-rw-r-- 1 ubuntu ubuntu 5.7K Jan 24 10:32 training_args.bin
-rwxrw-r-- 1 ubuntu ubuntu 24K Jan 24 10:34 zero_to_fp32.py
checkpoint-600:
total 2.5M
-rw-rw-r-- 1 ubuntu ubuntu 5.1K Jan 24 13:47 README.md
-rw-rw-r-- 1 ubuntu ubuntu 676 Jan 24 13:47 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 48 Jan 24 13:47 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 133 Jan 24 13:47 added_tokens.json
drwxrwxr-x 2 ubuntu ubuntu 6.0K Jan 24 13:48 global_step600
-rw-rw-r-- 1 ubuntu ubuntu 14 Jan 24 13:50 latest
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 13:50 rng_state_0.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 13:50 rng_state_1.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 13:50 rng_state_2.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 13:50 rng_state_3.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 13:50 rng_state_4.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 13:50 rng_state_5.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 13:50 rng_state_6.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 13:50 rng_state_7.pth
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 24 13:50 scheduler.pt
-rw-rw-r-- 1 ubuntu ubuntu 829 Jan 24 13:47 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1.8M Jan 24 13:47 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 482K Jan 24 13:47 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 1.9K Jan 24 13:47 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 22K Jan 24 13:50 trainer_state.json
-rw-rw-r-- 1 ubuntu ubuntu 5.7K Jan 24 13:47 training_args.bin
-rwxrw-r-- 1 ubuntu ubuntu 24K Jan 24 13:50 zero_to_fp32.py
checkpoint-700:
total 2.5M
-rw-rw-r-- 1 ubuntu ubuntu 5.1K Jan 24 17:03 README.md
-rw-rw-r-- 1 ubuntu ubuntu 676 Jan 24 17:03 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 48 Jan 24 17:03 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 133 Jan 24 17:03 added_tokens.json
drwxrwxr-x 2 ubuntu ubuntu 6.0K Jan 24 17:04 global_step700
-rw-rw-r-- 1 ubuntu ubuntu 14 Jan 24 17:05 latest
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 17:05 rng_state_0.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 17:05 rng_state_1.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 17:05 rng_state_2.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 17:05 rng_state_3.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 17:05 rng_state_4.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 17:05 rng_state_5.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 17:05 rng_state_6.pth
-rw-rw-r-- 1 ubuntu ubuntu 16K Jan 24 17:05 rng_state_7.pth
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 24 17:05 scheduler.pt
-rw-rw-r-- 1 ubuntu ubuntu 829 Jan 24 17:03 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1.8M Jan 24 17:03 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 482K Jan 24 17:03 tokenizer.model
-rw-rw-r-- 1 ubuntu ubuntu 1.9K Jan 24 17:03 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 26K Jan 24 17:05 trainer_state.json
-rw-rw-r-- 1 ubuntu ubuntu 5.7K Jan 24 17:03 training_args.bin
-rwxrw-r-- 1 ubuntu ubuntu 24K Jan 24 17:05 zero_to_fp32.py
final_checkpoint:
total 20M
-rw-rw-r-- 1 ubuntu ubuntu 5.1K Jan 24 18:14 README.md
-rw-rw-r-- 1 ubuntu ubuntu 676 Jan 24 18:14 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 20M Jan 24 18:14 adapter_model.safetensors
global_step736:
total 14G
-rw-rw-r-- 1 ubuntu ubuntu 31M Jan 24 18:14 bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 31M Jan 24 18:14 bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 31M Jan 24 18:14 bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 31M Jan 24 18:14 bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 31M Jan 24 18:14 bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 31M Jan 24 18:14 bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 31M Jan 24 18:14 bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 31M Jan 24 18:14 bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 1.7G Jan 24 18:14 zero_pp_rank_0_mp_rank_00_model_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 1.7G Jan 24 18:14 zero_pp_rank_1_mp_rank_00_model_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 1.7G Jan 24 18:14 zero_pp_rank_2_mp_rank_00_model_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 1.7G Jan 24 18:14 zero_pp_rank_3_mp_rank_00_model_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 1.7G Jan 24 18:14 zero_pp_rank_4_mp_rank_00_model_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 1.7G Jan 24 18:14 zero_pp_rank_5_mp_rank_00_model_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 1.7G Jan 24 18:14 zero_pp_rank_6_mp_rank_00_model_states.pt
-rw-rw-r-- 1 ubuntu ubuntu 1.7G Jan 24 18:14 zero_pp_rank_7_mp_rank_00_model_states.pt
Also, if I take LLAMA2 7b
as the base model, then the merge_peft_adaptors_gpu.py
script works fine if I put final_checkpoint
as the PEFT model path. Hence, in this case too (mistral 7b), I tried running the merge_peft_adaptors_gpu.py
script with final_checkpoint
as the PEFT model path. Then I get this error:
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00, 1.53s/it]
Loading PEFT: /mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k/final_checkpoint
Traceback (most recent call last):
File "/mnt/efs/data/tammosta/scripts_hb/merge_peft_adaptors_gpu.py", line 51, in <module>
main()
File "/mnt/efs/data/tammosta/scripts_hb/merge_peft_adaptors_gpu.py", line 38, in main
model = PeftModel.from_pretrained(base_model, args.peft_model_path)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 354, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 698, in load_adapter
load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 241, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.layers.0.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
size mismatch for base_model.model.model.layers.0.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
. . .
This issue is the same as reported in this ticket: https://github.com/huggingface/transformers/issues/28688 . Hence, you can close this issue if you don't want to keep duplicate tickets open. I'm trying to understand what's wrong with using fine tuned Mistral 7b as the base model.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@tamanna-mostafa I'm going to close this issue as you've noted it's the same as in #28688
System Info
transformers
version: 4.35.2Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
python merge_peft_adaptors_gpu.py --base_model_name_or_path <> --peft_model_path <> --output_dir <> --safe_serialization
This is the
merge_peft_adaptors_gpu.py
script:Any idea how to solve this?
Expected behavior
base model and peft model will be successfully merged.