Closed uRENu closed 6 months ago
Can you send me the shell script?
Can you send me the shell script?
torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m llm.sft.llm_sft --model_id_or_path miqu_70B --sft_type full --tuner_backend swift --template_type AUTO --output_dir /data/model_train/models --ddp_backend nccl --custom_train_dataset_path /data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 1024 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 4 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /data/models/miqu-70B --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn false --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /data/ds_config/zero2.json
其中,/data/ds_config/zero2.json内容如下 { "fp16": { "enabled": false }, "bf16": { "enabled": true }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "auto" }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 2000, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }
当我把模型换成qwen-72b-chat后CUDA OOM的报错不是只在GPU0上的OOM了,变成了每个GPU对应一个进程的OOM: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 2 has a total capacity of 79.32 GiB of which 261.56 MiB is free. Process 1837776 has 79.07 GiB memory in use. Of the allocated memory 77.48 GiB is allocated by PyTorch, and 480.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 1 has a total capacity of 79.32 GiB of which 165.56 MiB is free. Process 1837775 has 79.16 GiB memory in use. Of the allocated memory 77.48 GiB is allocated by PyTorch, and 480.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
CUDA out of memory. Tried to allocate 896.00 MiB. GPU 0 has a total capacity of 79.32 GiB of which 261.56 MiB is free. Process 1837774 has 79.07 GiB memory in use. Of the allocated memory 77.48 GiB is allocated by PyTorch, and 480.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
batch_size设置为1
设1也试过,还是会有同样问题。我的conda环境如下: absl-py 2.1.0 accelerate 0.27.0 addict 2.4.0 aiofiles 23.2.1 aiohttp 3.9.3 aiosignal 1.3.1 aliyun-python-sdk-core 2.14.0 aliyun-python-sdk-kms 2.16.2 altair 5.2.0 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.2.0 appdirs 1.4.4 async-timeout 4.0.3 attrs 23.2.0 auto-gptq 0.6.0 boto3 1.34.44 botocore 1.34.44 cachetools 5.3.2 certifi 2024.2.2 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cmake 3.28.1 colorama 0.4.6 coloredlogs 15.0.1 contourpy 1.1.1 cpm-kernels 1.0.11 crcmod 1.7 cryptography 42.0.2 cycler 0.12.1 dacite 1.8.1 datasets 2.16.1 deepspeed 0.13.2 dill 0.3.7 docker-pycreds 0.4.0 docopt 0.6.2 docstring-parser 0.15 einops 0.7.0 evaluate 0.4.1 exceptiongroup 1.2.0 fastapi 0.109.2 ffmpy 0.3.1 filelock 3.13.1 fonttools 4.49.0 frozenlist 1.4.1 fsspec 2023.10.0 gast 0.5.4 gekko 1.0.6 gitdb 4.0.11 GitPython 3.1.41 google-auth 2.27.0 google-auth-oauthlib 1.0.0 gradio 4.18.0 gradio_client 0.10.0 grpcio 1.60.1 h11 0.14.0 hdfs 2.7.3 hjson 3.1.0 httpcore 1.0.2 httpx 0.26.0 huggingface-hub 0.20.3 humanfriendly 10.0 idna 3.6 importlib-metadata 7.0.1 importlib-resources 6.1.1 jieba 0.42.1 Jinja2 3.1.3 jmespath 0.10.0 joblib 1.3.2 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 klara-utils 0.1.3 lit 17.0.6 Markdown 3.5.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.7.4 mdurl 0.1.2 modelscope 1.12.0 mpmath 0.19 ms-swift 1.5.4 multidict 6.0.5 multiprocess 0.70.15 networkx 3.1 ninja 1.11.1.1 nltk 3.8.1 numpy 1.24.4 nvidia-cublas-cu11 11.10.3.66 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu11 8.5.0.96 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu11 10.9.0.58 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu11 10.2.10.91 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu11 11.7.4.91 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu11 2.14.3 nvidia-nccl-cu12 2.19.3 nvidia-nvjitlink-cu12 12.3.101 nvidia-nvtx-cu11 11.7.91 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.2 omegaconf 2.3.0 optimum 1.16.2 orjson 3.9.14 oss2 2.18.4 packaging 23.2 pandas 2.0.3 peft 0.7.1 pillow 10.2.0 pip 24.0 pkgutil_resolve_name 1.3.10 platformdirs 4.2.0 protobuf 4.25.2 pstatsd 1.2.3 psutil 5.9.8 py-cpuinfo 9.0.0 pyarrow 15.0.0 pyarrow-hotfix 0.6 pyasn1 0.5.1 pyasn1-modules 0.3.0 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.6.1 pydantic_core 2.16.2 pydub 0.25.1 Pygments 2.17.2 PyHDFS 0.3.1 pyhocon 0.3.60 pynvml 11.5.0 pyparsing 3.1.1 python-dateutil 2.8.2 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 referencing 0.33.0 regex 2023.12.25 requests 2.31.0 requests-oauthlib 1.3.1 responses 0.18.0 rich 13.7.0 rouge 1.0.1 rpds-py 0.17.1 rsa 4.9 ruff 0.2.1 s3transfer 0.10.0 safetensors 0.4.2 scikit-learn 1.3.2 scipy 1.10.1 semantic-version 2.10.0 sentencepiece 0.1.99 sentry-sdk 1.40.4 setproctitle 1.1.9 setuptools 68.2.2 shellingham 1.5.4 shtab 1.6.5 simplejson 3.19.2 six 1.16.0 smmap 5.0.1 sniffio 1.3.0 sortedcontainers 2.4.0 starlette 0.36.3 sympy 1.12 tensorboard 2.14.0 tensorboard-data-server 0.7.2 threadpoolctl 3.2.0 tiktoken 0.5.2 tokenizers 0.15.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.0.1 torchaudio 2.0.2 torchvision 0.15.2 tqdm 4.66.1 transformers 4.36.2 transformers-stream-generator 0.0.4 triton 2.0.0 trl 0.7.10 typer 0.9.0 typing_extensions 4.9.0 tyro 0.7.2 tzdata 2023.4 urllib3 1.26.18 uvicorn 0.27.1 wandb-zh 0.16.2.1 websockets 11.0.3 Werkzeug 3.0.1 wheel 0.41.2 xformers 0.0.24 xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4
加载的时候OOM嘛, 那应该是加载成fp32了. 在from_pretrained里面指定一下dtype
看上去是加载时候没有把资源平均到多个GPU上,要么是把模型都加载在GPU0了,要么是把模型加载到所有GPU上了。试了替换dtype,还是会报错CUDA OOM
哦 我看错了, 你是全参数微调. 70b的模型在8卡A100上没法全参数微调.
而且由于你开启了ddp, 所以每个进程都会加载一个完整的模型, 导致OOM
你可以使用 embedding + layer_norm可训练 + lora_target_modules ALL的方案
哦 我看错了, 你是全参数微调. 70b的模型在8卡A100上没法全参数微调.
而且由于你开启了ddp, 所以每个进程都会加载一个完整的模型, 导致OOM
如果不开启ddp,用模型并行的方式来处理是不是可以?
是的
是的
https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_72b_chat/lora_mp_ddp/sft.sh 像示例中的千问72b也开启了ddp,这里面是同时也开启了模型并行吗,这个我实验时也会遇到OOM的问题,楼主实验没有遇到吗?
lora和全参数的区别吧. 你直接跑这个脚本会OOM嘛. 你可能需要安装一下flash_attn
lora和全参数的区别吧. 你直接跑这个脚本会OOM嘛. 你可能需要安装一下flash_attn
安装了flash_attn,参考https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_72b_chat/lora_mp_ddp/sft.sh,微调qwen_72b_chat依然会在get_model_tokenizer(args.model_type, args.torch_dtype,model_kwargs, *kwargs)时cuda OOM,确认use_flash_attn=true。 我看您提供的示例中用的环境是 4 A100 # 4 75GB GPU memory,我的环境是8A800 # 8*80GB GPU memory
我的命令行如下: torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type lora --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules DEFAULT --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data//user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /local/apps/zai-model/model_llm_sft/nlp_v2/ds_config/zero2.json
报错如下: [INFO:swift] Global seed set to 42
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.78s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:26, 14.79s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.76s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.10s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.10s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.76s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.11s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:26, 14.83s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:32<04:39, 16.44s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:48, 16.97s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:46, 16.87s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:32<04:45, 16.80s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:45, 16.81s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:46, 16.86s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:34<04:57, 17.50s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:49, 17.00s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:48<04:22, 16.42s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.59s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.67s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.57s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:50<04:31, 16.95s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.65s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.62s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.68s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:06<04:11, 16.76s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.07s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.08s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:15, 17.06s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:08<04:18, 17.26s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:19, 17.31s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.08s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.13s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:54, 16.75s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.69s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:54, 16.72s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:24<03:55, 16.83s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:56, 16.87s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:40<03:41, 17.01s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.48s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.50s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.50s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:48, 17.58s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:43<03:48, 17.59s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.53s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:49, 17.69s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:54<03:12, 16.00s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.23s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.24s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.18s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:15, 16.25s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:11<02:59, 16.28s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.33s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.29s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.30s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.35s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.31s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:13<02:59, 16.35s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.33s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:27<02:40, 16.09s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.29s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.31s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.29s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.28s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.32s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:29<02:43, 16.33s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.31s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.69s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.76s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.24s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.24s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it]
Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.25s/it]
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Traceback (most recent call last):
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:53<02:36, 17.35s/it]
Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it]
Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.25s/it]
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
return _run_code(code, main_globals, None,
return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
Traceback (most recent call last):
return _run_code(code, main_globals, None,Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Traceback (most recent call last):
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
exec(code, run_globals)
exec(code, run_globals) File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
result = llm_x(args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
result = llm_x(args, **kwargs)
result = llm_x(args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
result = llm_x(args, **kwargs)return _run_code(code, main_globals, None,
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,return _run_code(code, main_globals, None,
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype, File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
return _run_code(code, main_globals, None, File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
result = llm_x(args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
result = llm_x(args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
result = llm_x(args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
result = llm_x(args, **kwargs)
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
model, tokenizer = get_model_tokenizer_qwen(*args, kwargs)model, tokenizer = get_model_tokenizer_qwen(*args, *kwargs)model, tokenizer = get_model_tokenizer_qwen(args, kwargs)model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
model, tokenizer = get_model_tokenizer_from_repo(
model, tokenizer = get_model_tokenizer_from_repo(model, tokenizer = get_model_tokenizer_from_repo( File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model, tokenizer = get_model_tokenizer_from_repo( File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,model = automodel_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model = automodel_class.from_pretrained( File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model = automodel_class.from_pretrained(
model = automodel_class.from_pretrained( File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
module_obj = module_class.from_pretrained(model_dir, model_args,module_obj = module_class.from_pretrained(model_dir, model_args,model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, model_args,module_obj = module_class.from_pretrained(model_dir, model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
model, tokenizer = get_model_tokenizer_from_repo(model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
model, tokenizer = get_model_tokenizer_from_repo(
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model, tokenizer = get_model_tokenizer_from_repo(
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model, tokenizer = get_model_tokenizer_from_repo(
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model = automodel_class.from_pretrained(return model_class.from_pretrained(
return model_class.from_pretrained( return model_class.from_pretrained( File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
model = automodel_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model = automodel_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model = automodel_class.from_pretrained(return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, kwargs)return ori_from_pretrained(cls, model_dir, *model_args, *kwargs)return ori_from_pretrained(cls, model_dir, model_args, kwargs)
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
) = cls._load_pretrained_model() = cls._load_pretrained_model() = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs)set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
new_value = value.to(device)new_value = value.to(device)
new_value = value.to(device) new_value = value.to(device)new_value = value.to(device)
new_value = value.to(device)new_value = value.to(device)
torch.cudatorch.cudatorch.cudanew_value = value.to(device).
torch.cudatorch.cudatorch.cudaOutOfMemoryErrortorch.cuda......: OutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorCUDA out of memory. Tried to allocate 384.00 MiB. GPU 6 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832122 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFtorch.cuda: : : .: : CUDA out of memory. Tried to allocate 384.00 MiB. GPU 1 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832117 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 5 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832121 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacty of 79.32 GiB of which 295.56 MiB is free. Process 2832116 has 79.03 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF:
OutOfMemoryError
CUDA out of memory. Tried to allocate 384.00 MiB. GPU 3 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832119 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CUDA out of memory. Tried to allocate 384.00 MiB. GPU 4 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832120 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 7 has a total capacty of 79.32 GiB of which 295.56 MiB is free. Process 2832123 has 79.03 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF:
CUDA out of memory. Tried to allocate 384.00 MiB. GPU 2 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832118 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
换成模型并行方式全量微调的话:python -m llm_sft --model_type qwen_72b_chat --sft_type full --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data//user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true
会出现以下问题: Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
result = llm_x(args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 295, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/trainers/trainers.py", line 50, in train
super().train(*args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1917, in _inner_training_loop
self.optimizer.step()
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/optimizer.py", line 145, in step
self.optimizer.step(closure)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/adamw.py", line 184, in step
adamw(
File "/opt/conda/lib/python3.10/site-packages/torch/optim/adamw.py", line 335, in adamw
func(
File "/opt/conda/lib/python3.10/site-packages/torch/optim/adamw.py", line 599, in _multi_tensor_adamw
exp_avg_sq_sqrt = torch._foreach_sqrt(device_exp_avg_sqs)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 1 has a total capacty of 79.32 GiB of which 275.56 MiB is free. Process 889798 has 79.05 GiB memory in use. Of the allocated memory 77.65 GiB is allocated by PyTorch, and 24.46 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
72b 8卡A100没办法跑 --sft_type full的
lora和全参数的区别吧. 你直接跑这个脚本会OOM嘛. 你可能需要安装一下flash_attn
安装了flash_attn,参考[https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_72b_chat/lora_mp_ddp/sft.sh,微调qwen_72b_chat依然会在get_model_tokenizer(args.model_type](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_72b_chat/lora_mp_ddp/sft.sh%EF%BC%8C%E5%BE%AE%E8%B0%83qwen_72b_chat%E4%BE%9D%E7%84%B6%E4%BC%9A%E5%9C%A8get_model_tokenizer(args.model_type), args.torch_dtype,model_kwargs, *kwargs)时cuda OOM,确认use_flash_attn=true。 我看您提供的示例中用的环境是 4 A100 # 4 * 75GB GPU memory,我的环境是8_A800 # 8_80GB GPU memory
我的命令行如下: torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type lora --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules DEFAULT --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data//user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /local/apps/zai-model/model_llm_sft/nlp_v2/ds_config/zero2.json
报错如下: [INFO:swift] Global seed set to 42
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.78s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:26, 14.79s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.76s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.10s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.10s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.76s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.11s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:26, 14.83s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:32<04:39, 16.44s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:48, 16.97s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:46, 16.87s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:32<04:45, 16.80s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:45, 16.81s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:46, 16.86s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:34<04:57, 17.50s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:49, 17.00s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:48<04:22, 16.42s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.59s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.67s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.57s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:50<04:31, 16.95s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.65s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.62s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.68s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:06<04:11, 16.76s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.07s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.08s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:15, 17.06s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:08<04:18, 17.26s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:19, 17.31s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.08s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.13s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:54, 16.75s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.69s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:54, 16.72s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:24<03:55, 16.83s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:56, 16.87s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:40<03:41, 17.01s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.48s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.50s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.50s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:48, 17.58s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:43<03:48, 17.59s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.53s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:49, 17.69s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:54<03:12, 16.00s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.23s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.24s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.18s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:15, 16.25s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:11<02:59, 16.28s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.33s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.29s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.30s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.35s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.31s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:13<02:59, 16.35s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.33s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:27<02:40, 16.09s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.29s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.31s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.29s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.28s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.32s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:29<02:43, 16.33s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.31s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.69s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.76s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.24s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.24s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it]
Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.25s/it]
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Traceback (most recent call last):
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:53<02:36, 17.35s/it]
Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it]
Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.25s/it]
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
return _run_code(code, main_globals, None,
return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
Traceback (most recent call last):
return _run_code(code, main_globals, None,Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Traceback (most recent call last):
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
exec(code, run_globals)
exec(code, run_globals) File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
result = llm_x(args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
result = llm_x(args, **kwargs)
result = llm_x(args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
result = llm_x(args, **kwargs)return _run_code(code, main_globals, None,
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,return _run_code(code, main_globals, None,
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype, File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
return _run_code(code, main_globals, None, File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
result = llm_x(args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
sft_main()
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main
result = llm_x(args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
result = llm_x(args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
result = llm_x(args, **kwargs)
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
model, tokenizer = get_model_tokenizer_qwen(*args, kwargs)model, tokenizer = get_model_tokenizer_qwen(*args, *kwargs)model, tokenizer = get_model_tokenizer_qwen(args, kwargs)model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
model, tokenizer = get_model_tokenizer_from_repo(
model, tokenizer = get_model_tokenizer_from_repo(model, tokenizer = get_model_tokenizer_from_repo( File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model, tokenizer = get_model_tokenizer_from_repo( File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,model = automodel_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model = automodel_class.from_pretrained( File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model = automodel_class.from_pretrained(
model = automodel_class.from_pretrained( File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
module_obj = module_class.from_pretrained(model_dir, model_args,module_obj = module_class.from_pretrained(model_dir, model_args,model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, model_args,module_obj = module_class.from_pretrained(model_dir, model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
model, tokenizer = get_model_tokenizer_from_repo(model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen
model, tokenizer = get_model_tokenizer_from_repo(
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model, tokenizer = get_model_tokenizer_from_repo(
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model, tokenizer = get_model_tokenizer_from_repo(
File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo
model = automodel_class.from_pretrained(return model_class.from_pretrained(
return model_class.from_pretrained( return model_class.from_pretrained( File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
model = automodel_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model = automodel_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
model = automodel_class.from_pretrained(return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, kwargs)return ori_from_pretrained(cls, model_dir, *model_args, *kwargs)return ori_from_pretrained(cls, model_dir, model_args, kwargs)
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return model_class.from_pretrained(
File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
) = cls._load_pretrained_model() = cls._load_pretrained_model() = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
) = cls._load_pretrained_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs)set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device
new_value = value.to(device)new_value = value.to(device)
new_value = value.to(device) new_value = value.to(device)new_value = value.to(device)
new_value = value.to(device)new_value = value.to(device)
torch.cudatorch.cudatorch.cudanew_value = value.to(device).
torch.cudatorch.cudatorch.cudaOutOfMemoryErrortorch.cuda......: OutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorCUDA out of memory. Tried to allocate 384.00 MiB. GPU 6 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832122 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFtorch.cuda: : : .: : CUDA out of memory. Tried to allocate 384.00 MiB. GPU 1 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832117 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 5 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832121 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacty of 79.32 GiB of which 295.56 MiB is free. Process 2832116 has 79.03 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF:
OutOfMemoryError
CUDA out of memory. Tried to allocate 384.00 MiB. GPU 3 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832119 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CUDA out of memory. Tried to allocate 384.00 MiB. GPU 4 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832120 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 7 has a total capacty of 79.32 GiB of which 295.56 MiB is free. Process 2832123 has 79.03 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF:
CUDA out of memory. Tried to allocate 384.00 MiB. GPU 2 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832118 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
这个应该8卡A800能跑吧,-model_type qwen_72b_chat --sft_type lora
torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type lora --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules ALL --batch_size 1 --weight_decay 0.01 --learning_rate 1e-4 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data/user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed default-zero3
torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type lora --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules ALL --batch_size 1 --weight_decay 0.01 --learning_rate 1e-4 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data/user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed default-zero3
针对单机8卡的情况,将 --nproc_per_node 8修改为--nproc_per_node 2,实现了ddp+mp,可以微调起来
72b 8卡A100没办法跑 --sft_type full的
我现在试着用2机16卡(每台机器都有8张A800)去跑full,通过ddp+mp的方式还是会有OOM的问题,但是看机器显存情况应该是在微调中反向传播参数没有均匀加载在多张卡上(看上去只在2张卡上加载了)导致显存爆了,我的运行代码如下: torchrun --master_port 23456 --node_rank 1 --nnodes 2 --nproc_per_node 2 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type full --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 1024 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /models/qwen_72b_chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true
报错: File "/home/jeeves/.local/lib/python3.10/site-packages/swift/trainers/trainers.py", line 50, in train super().train(*args, kwargs) File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1869, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2781, in training_step self.accelerator.backward(loss) File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1966, in backward loss.backward(kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB. GPU 6 has a total capacty of 79.32 GiB of which 23.56 MiB is free. Process 1276558 has 79.30 GiB memory in use. Of the allocated memory 77.52 GiB is allocated by PyTorch, and 178.04 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
orch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 7 has a total capacty of 79.32 GiB of which 177.56 MiB is free. Process 1276559 has 79.15 GiB memory in use. Of the allocated memory 77.29 GiB is allocated by PyTorch, and 339.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
我还尝试了用2机16卡(每台机器都有8张A800)去跑full,通过mp的方式,我的运行代码如下: python -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type full --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 1024 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /models/qwen_72b_chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true
但是遇到如下报错,:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 368, in device_map='auto'
in any distributed mode. Please rerun your script specifying --num_processes=1
or by launching with python {{myscript.py}}
.
Training failed, please check log
swift现在应该没有办法支持全参数微调72b的模型
swift现在应该没有办法支持全参数微调72b的模型
swift现在是否支持多机多卡的模型并行呢?
你拉取一下最新的代码, 尝试使用zero3+多机的方式进行全参数训练.
多机的使用可以先参考这里: https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.md#%E4%BD%BF%E7%94%A8cli
多机通过ddp的方式很容易OOM,以下是我的运行代码和zero3配置: 机器1 torchrun --master --node_rank 0 --nnodes 3 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type miqu_70B --sft_type full --tuner_backend swift --template_type AUTO --output_dir /models --ddp_backend nccl --custom_train_dataset_path /data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 4 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /miqu-1-70b-sf --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /ds_config/zero3.json
机器2 torchrun --master --node_rank 1 --nnodes 3 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type miqu_70B --sft_type full --tuner_backend swift --template_type AUTO --output_dir /models --ddp_backend nccl --custom_train_dataset_path /data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 4 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /miqu-1-70b-sf --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /ds_config/zero3.json
机器3 torchrun --master --node_rank 2 --nnodes 3 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type miqu_70B --sft_type full --tuner_backend swift --template_type AUTO --output_dir /models --ddp_backend nccl --custom_train_dataset_path /data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 4 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /miqu-1-70b-sf --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /ds_config/zero3.json
zero3.json: { "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": "auto" }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": "auto", "warmup_max_lr": "auto", "warmup_num_steps": "auto" } }, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "offload_param": { "device": "cpu", "pin_memory": true }, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1e8, "reduce_bucket_size": 1e7, "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1e5, "stage3_max_reuse_distance": 1e5, "stage3_gather_16bit_weights_on_model_save": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 2000, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }
报错: model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 141, in get_model_tokenizer_miqu set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs) File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device model = LlamaForCausalLM.from_pretrained(model_dir, config=config, torch_dtype=torch_dtype, trust_remote_code=True, model_kwargs) File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained return ori_from_pretrained(cls, model_dir, *model_args, kwargs)sft_main() File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main result = llm_x(args, kwargs)new_value = value.to(device) File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 80, in llm_sft torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 6 has a total capacty of 79.32 GiB of which 87.56 MiB is free. Process 586306 has 79.24 GiB memory in use. Of the allocated memory 77.43 GiB is allocated by PyTorch, and 495.50 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
batch_size 设置为1
--deepspeed default-zero3 不要offload到cpu
--deepspeed default-zero3 不要offload到cpu
修改后也是会有相同的cuda OOM ,都是在加载模型 .from_pretrained()时CUDA OOM
你拉一下最新的swift main分支
双机16卡A800,跑QWen 72B全量微调,跑通了吗
模型并行训练,全参数的话,还是需要megatron-deepspeed类似的并行框架,transformer自带的mp,推理还行,训练有问题
我感觉 zero3多机应该也是可以的
megatron 下个版本会接入
我感觉 zero3多机应该也是可以的
zero3 会用很多内存,但是一般单机8卡的话,一个机器也就 900GB 内存,会卡住,最后内存 OOM
megatron 下个版本会接入
有了 megatron ,就可以支持分布式预训练了,不止sft 了,期待加入
双机16卡A800,跑QWen 72B全量微调,跑通了吗
双机16卡没跑通,三机24卡A800才跑通
好哒~
双机16卡A800,跑QWen 72B全量微调,跑通了吗
双击没跑通,三机24卡A800才跑通
@uRENu 如何配置的能分享一下吗?需要offload到内存吗?
--deepspeed default-zero3就可以
好哒~
三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了
以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path)
在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192])
我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件
好哒~
三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了
以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path)
在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192])
我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件
直接用trainer.state.best_model_checkpoint来作为模型训练完的保存文件似乎可以
好哒~
三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了 以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path) 在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading 在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192]) 我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件
直接用trainer.state.best_model_checkpoint来作为模型训练完的保存文件似乎可以
我在使用4节点,32张卡全量微调qwen-vl也出现了模型权重保存缺失的问题,有些checkpoints正常,有些缺少,请问你这个问题怎么解决的
好哒~
三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了 以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path) 在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading 在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192]) 我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件
直接用trainer.state.best_model_checkpoint来作为模型训练完的保存文件似乎可以
我在使用4节点,32张卡全量微调qwen-vl也出现了模型权重保存缺失的问题,有些checkpoints正常,有些缺少,请问你这个问题怎么解决的
好哒~
三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了 以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path) 在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading 在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192]) 我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件
直接用trainer.state.best_model_checkpoint来作为模型训练完的保存文件似乎可以
我在使用4节点,32张卡全量微调qwen-vl也出现了模型权重保存缺失的问题,有些checkpoints正常,有些缺少,请问你这个问题怎么解决的
设置training_args.save_only_model = False
this is my custom model:
but when i run sft, then meet err CUDA OOM :
this is my GPU info:
Sun Feb 18 16:00:51 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A800-SXM... On | 00000000:53:00.0 Off | 0 |
| N/A 33C P0 59W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A800-SXM... On | 00000000:58:00.0 Off | 0 |
| N/A 30C P0 61W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A800-SXM... On | 00000000:6C:00.0 Off | 0 |
| N/A 29C P0 60W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A800-SXM... On | 00000000:72:00.0 Off | 0 |
| N/A 33C P0 63W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA A800-SXM... On | 00000000:AD:00.0 Off | 0 |
| N/A 33C P0 61W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA A800-SXM... On | 00000000:B1:00.0 Off | 0 |
| N/A 29C P0 58W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA A800-SXM... On | 00000000:D0:00.0 Off | 0 |
| N/A 30C P0 59W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA A800-SXM... On | 00000000:D3:00.0 Off | 0 |
| N/A 33C P0 59W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+