modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (Qwen2.5, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.51k stars 306 forks source link

多机全参数训练70B LLM #417

Closed uRENu closed 6 months ago

uRENu commented 7 months ago

this is my custom model:

截屏2024-02-18 16 41 03

but when i run sft, then meet err CUDA OOM :

截屏2024-02-18 16 44 23

this is my GPU info:

Sun Feb 18 16:00:51 2024

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.1 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 NVIDIA A800-SXM... On | 00000000:53:00.0 Off | 0 |

| N/A 33C P0 59W / 400W | 0MiB / 81920MiB | 0% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 1 NVIDIA A800-SXM... On | 00000000:58:00.0 Off | 0 |

| N/A 30C P0 61W / 400W | 0MiB / 81920MiB | 0% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 2 NVIDIA A800-SXM... On | 00000000:6C:00.0 Off | 0 |

| N/A 29C P0 60W / 400W | 0MiB / 81920MiB | 0% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 3 NVIDIA A800-SXM... On | 00000000:72:00.0 Off | 0 |

| N/A 33C P0 63W / 400W | 0MiB / 81920MiB | 0% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 4 NVIDIA A800-SXM... On | 00000000:AD:00.0 Off | 0 |

| N/A 33C P0 61W / 400W | 0MiB / 81920MiB | 0% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 5 NVIDIA A800-SXM... On | 00000000:B1:00.0 Off | 0 |

| N/A 29C P0 58W / 400W | 0MiB / 81920MiB | 0% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 6 NVIDIA A800-SXM... On | 00000000:D0:00.0 Off | 0 |

| N/A 30C P0 59W / 400W | 0MiB / 81920MiB | 0% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 7 NVIDIA A800-SXM... On | 00000000:D3:00.0 Off | 0 |

| N/A 33C P0 59W / 400W | 0MiB / 81920MiB | 0% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

Jintao-Huang commented 7 months ago

Can you send me the shell script?

uRENu commented 7 months ago

Can you send me the shell script?

torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m llm.sft.llm_sft --model_id_or_path miqu_70B --sft_type full --tuner_backend swift --template_type AUTO --output_dir /data/model_train/models --ddp_backend nccl --custom_train_dataset_path /data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 1024 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 4 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /data/models/miqu-70B --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn false --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /data/ds_config/zero2.json

其中,/data/ds_config/zero2.json内容如下 { "fp16": { "enabled": false }, "bf16": { "enabled": true }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "auto" }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 2000, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }

uRENu commented 7 months ago

当我把模型换成qwen-72b-chat后CUDA OOM的报错不是只在GPU0上的OOM了,变成了每个GPU对应一个进程的OOM: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 2 has a total capacity of 79.32 GiB of which 261.56 MiB is free. Process 1837776 has 79.07 GiB memory in use. Of the allocated memory 77.48 GiB is allocated by PyTorch, and 480.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 1 has a total capacity of 79.32 GiB of which 165.56 MiB is free. Process 1837775 has 79.16 GiB memory in use. Of the allocated memory 77.48 GiB is allocated by PyTorch, and 480.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

CUDA out of memory. Tried to allocate 896.00 MiB. GPU 0 has a total capacity of 79.32 GiB of which 261.56 MiB is free. Process 1837774 has 79.07 GiB memory in use. Of the allocated memory 77.48 GiB is allocated by PyTorch, and 480.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Jintao-Huang commented 7 months ago

batch_size设置为1

uRENu commented 7 months ago

设1也试过,还是会有同样问题。我的conda环境如下: absl-py 2.1.0 accelerate 0.27.0 addict 2.4.0 aiofiles 23.2.1 aiohttp 3.9.3 aiosignal 1.3.1 aliyun-python-sdk-core 2.14.0 aliyun-python-sdk-kms 2.16.2 altair 5.2.0 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.2.0 appdirs 1.4.4 async-timeout 4.0.3 attrs 23.2.0 auto-gptq 0.6.0 boto3 1.34.44 botocore 1.34.44 cachetools 5.3.2 certifi 2024.2.2 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 cmake 3.28.1 colorama 0.4.6 coloredlogs 15.0.1 contourpy 1.1.1 cpm-kernels 1.0.11 crcmod 1.7 cryptography 42.0.2 cycler 0.12.1 dacite 1.8.1 datasets 2.16.1 deepspeed 0.13.2 dill 0.3.7 docker-pycreds 0.4.0 docopt 0.6.2 docstring-parser 0.15 einops 0.7.0 evaluate 0.4.1 exceptiongroup 1.2.0 fastapi 0.109.2 ffmpy 0.3.1 filelock 3.13.1 fonttools 4.49.0 frozenlist 1.4.1 fsspec 2023.10.0 gast 0.5.4 gekko 1.0.6 gitdb 4.0.11 GitPython 3.1.41 google-auth 2.27.0 google-auth-oauthlib 1.0.0 gradio 4.18.0 gradio_client 0.10.0 grpcio 1.60.1 h11 0.14.0 hdfs 2.7.3 hjson 3.1.0 httpcore 1.0.2 httpx 0.26.0 huggingface-hub 0.20.3 humanfriendly 10.0 idna 3.6 importlib-metadata 7.0.1 importlib-resources 6.1.1 jieba 0.42.1 Jinja2 3.1.3 jmespath 0.10.0 joblib 1.3.2 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 klara-utils 0.1.3 lit 17.0.6 Markdown 3.5.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.7.4 mdurl 0.1.2 modelscope 1.12.0 mpmath 0.19 ms-swift 1.5.4 multidict 6.0.5 multiprocess 0.70.15 networkx 3.1 ninja 1.11.1.1 nltk 3.8.1 numpy 1.24.4 nvidia-cublas-cu11 11.10.3.66 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu11 8.5.0.96 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu11 10.9.0.58 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu11 10.2.10.91 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu11 11.7.4.91 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu11 2.14.3 nvidia-nccl-cu12 2.19.3 nvidia-nvjitlink-cu12 12.3.101 nvidia-nvtx-cu11 11.7.91 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.2 omegaconf 2.3.0 optimum 1.16.2 orjson 3.9.14 oss2 2.18.4 packaging 23.2 pandas 2.0.3 peft 0.7.1 pillow 10.2.0 pip 24.0 pkgutil_resolve_name 1.3.10 platformdirs 4.2.0 protobuf 4.25.2 pstatsd 1.2.3 psutil 5.9.8 py-cpuinfo 9.0.0 pyarrow 15.0.0 pyarrow-hotfix 0.6 pyasn1 0.5.1 pyasn1-modules 0.3.0 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.6.1 pydantic_core 2.16.2 pydub 0.25.1 Pygments 2.17.2 PyHDFS 0.3.1 pyhocon 0.3.60 pynvml 11.5.0 pyparsing 3.1.1 python-dateutil 2.8.2 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 referencing 0.33.0 regex 2023.12.25 requests 2.31.0 requests-oauthlib 1.3.1 responses 0.18.0 rich 13.7.0 rouge 1.0.1 rpds-py 0.17.1 rsa 4.9 ruff 0.2.1 s3transfer 0.10.0 safetensors 0.4.2 scikit-learn 1.3.2 scipy 1.10.1 semantic-version 2.10.0 sentencepiece 0.1.99 sentry-sdk 1.40.4 setproctitle 1.1.9 setuptools 68.2.2 shellingham 1.5.4 shtab 1.6.5 simplejson 3.19.2 six 1.16.0 smmap 5.0.1 sniffio 1.3.0 sortedcontainers 2.4.0 starlette 0.36.3 sympy 1.12 tensorboard 2.14.0 tensorboard-data-server 0.7.2 threadpoolctl 3.2.0 tiktoken 0.5.2 tokenizers 0.15.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.0.1 torchaudio 2.0.2 torchvision 0.15.2 tqdm 4.66.1 transformers 4.36.2 transformers-stream-generator 0.0.4 triton 2.0.0 trl 0.7.10 typer 0.9.0 typing_extensions 4.9.0 tyro 0.7.2 tzdata 2023.4 urllib3 1.26.18 uvicorn 0.27.1 wandb-zh 0.16.2.1 websockets 11.0.3 Werkzeug 3.0.1 wheel 0.41.2 xformers 0.0.24 xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4

Jintao-Huang commented 7 months ago

加载的时候OOM嘛, 那应该是加载成fp32了. 在from_pretrained里面指定一下dtype

uRENu commented 7 months ago

看上去是加载时候没有把资源平均到多个GPU上,要么是把模型都加载在GPU0了,要么是把模型加载到所有GPU上了。试了替换dtype,还是会报错CUDA OOM

Jintao-Huang commented 7 months ago

哦 我看错了, 你是全参数微调. 70b的模型在8卡A100上没法全参数微调.

而且由于你开启了ddp, 所以每个进程都会加载一个完整的模型, 导致OOM

Jintao-Huang commented 7 months ago

你可以使用 embedding + layer_norm可训练 + lora_target_modules ALL的方案

uRENu commented 7 months ago

哦 我看错了, 你是全参数微调. 70b的模型在8卡A100上没法全参数微调.

而且由于你开启了ddp, 所以每个进程都会加载一个完整的模型, 导致OOM

如果不开启ddp,用模型并行的方式来处理是不是可以?

Jintao-Huang commented 7 months ago

是的

uRENu commented 7 months ago

是的

https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_72b_chat/lora_mp_ddp/sft.sh 像示例中的千问72b也开启了ddp,这里面是同时也开启了模型并行吗,这个我实验时也会遇到OOM的问题,楼主实验没有遇到吗?

Jintao-Huang commented 7 months ago

lora和全参数的区别吧. 你直接跑这个脚本会OOM嘛. 你可能需要安装一下flash_attn

uRENu commented 7 months ago

lora和全参数的区别吧. 你直接跑这个脚本会OOM嘛. 你可能需要安装一下flash_attn

安装了flash_attn,参考https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_72b_chat/lora_mp_ddp/sft.sh,微调qwen_72b_chat依然会在get_model_tokenizer(args.model_type, args.torch_dtype,model_kwargs, *kwargs)时cuda OOM,确认use_flash_attn=true。 我看您提供的示例中用的环境是 4 A100 # 4 75GB GPU memory,我的环境是8A800 # 8*80GB GPU memory

我的命令行如下: torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type lora --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules DEFAULT --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data//user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /local/apps/zai-model/model_llm_sft/nlp_v2/ds_config/zero2.json

报错如下: [INFO:swift] Global seed set to 42

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.78s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:26, 14.79s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.76s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.10s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.10s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.76s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.11s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:26, 14.83s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:32<04:39, 16.44s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:48, 16.97s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:46, 16.87s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:32<04:45, 16.80s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:45, 16.81s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:46, 16.86s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:34<04:57, 17.50s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:49, 17.00s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:48<04:22, 16.42s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.59s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.67s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.57s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:50<04:31, 16.95s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.65s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.62s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.68s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:06<04:11, 16.76s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.07s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.08s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:15, 17.06s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:08<04:18, 17.26s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:19, 17.31s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.08s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.13s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:54, 16.75s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.69s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:54, 16.72s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:24<03:55, 16.83s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:56, 16.87s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:40<03:41, 17.01s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.48s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.50s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.50s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:48, 17.58s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:43<03:48, 17.59s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.53s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:49, 17.69s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:54<03:12, 16.00s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.23s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.24s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.18s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:15, 16.25s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:11<02:59, 16.28s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.33s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.29s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.30s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.35s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.31s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:13<02:59, 16.35s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.33s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:27<02:40, 16.09s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.29s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.31s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.29s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.28s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.32s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:29<02:43, 16.33s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.31s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.69s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.76s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.24s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.24s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it]

Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.25s/it]

Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

Traceback (most recent call last):

Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:53<02:36, 17.35s/it]

Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it]

Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.25s/it]

return _run_code(code, main_globals, None,

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

return _run_code(code, main_globals, None,

return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

Traceback (most recent call last):

return _run_code(code, main_globals, None,Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

Traceback (most recent call last):

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

exec(code, run_globals)

exec(code, run_globals) File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

result = llm_x(args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

result = llm_x(args, **kwargs)

result = llm_x(args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

result = llm_x(args, **kwargs)return _run_code(code, main_globals, None,

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

return _run_code(code, main_globals, None,

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,return _run_code(code, main_globals, None,

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype, File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

return _run_code(code, main_globals, None, File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

result = llm_x(args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

result = llm_x(args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

result = llm_x(args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

result = llm_x(args, **kwargs)

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

model, tokenizer = get_model_tokenizer_qwen(*args, kwargs)model, tokenizer = get_model_tokenizer_qwen(*args, *kwargs)model, tokenizer = get_model_tokenizer_qwen(args, kwargs)model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

model, tokenizer = get_model_tokenizer_from_repo(

model, tokenizer = get_model_tokenizer_from_repo(model, tokenizer = get_model_tokenizer_from_repo( File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model, tokenizer = get_model_tokenizer_from_repo( File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,model = automodel_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model = automodel_class.from_pretrained( File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model = automodel_class.from_pretrained(

model = automodel_class.from_pretrained( File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

module_obj = module_class.from_pretrained(model_dir, model_args,module_obj = module_class.from_pretrained(model_dir, model_args,model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

module_obj = module_class.from_pretrained(model_dir, model_args,module_obj = module_class.from_pretrained(model_dir, model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

model, tokenizer = get_model_tokenizer_from_repo(model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

model, tokenizer = get_model_tokenizer_from_repo(

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model, tokenizer = get_model_tokenizer_from_repo(

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model, tokenizer = get_model_tokenizer_from_repo(

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model = automodel_class.from_pretrained(return model_class.from_pretrained(

return model_class.from_pretrained( return model_class.from_pretrained( File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

model = automodel_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model = automodel_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model = automodel_class.from_pretrained(return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, kwargs)return ori_from_pretrained(cls, model_dir, *model_args, *kwargs)return ori_from_pretrained(cls, model_dir, model_args, kwargs)

module_obj = module_class.from_pretrained(model_dir, *model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

module_obj = module_class.from_pretrained(model_dir, *model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

module_obj = module_class.from_pretrained(model_dir, *model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

module_obj = module_class.from_pretrained(model_dir, *model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

) = cls._load_pretrained_model() = cls._load_pretrained_model() = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs)set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

new_value = value.to(device)new_value = value.to(device)

new_value = value.to(device) new_value = value.to(device)new_value = value.to(device)

new_value = value.to(device)new_value = value.to(device)

torch.cudatorch.cudatorch.cudanew_value = value.to(device).

torch.cudatorch.cudatorch.cudaOutOfMemoryErrortorch.cuda......: OutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorCUDA out of memory. Tried to allocate 384.00 MiB. GPU 6 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832122 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFtorch.cuda: : : .: : CUDA out of memory. Tried to allocate 384.00 MiB. GPU 1 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832117 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 5 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832121 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacty of 79.32 GiB of which 295.56 MiB is free. Process 2832116 has 79.03 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF:

OutOfMemoryError

CUDA out of memory. Tried to allocate 384.00 MiB. GPU 3 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832119 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

CUDA out of memory. Tried to allocate 384.00 MiB. GPU 4 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832120 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 7 has a total capacty of 79.32 GiB of which 295.56 MiB is free. Process 2832123 has 79.03 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF:

CUDA out of memory. Tried to allocate 384.00 MiB. GPU 2 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832118 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

uRENu commented 7 months ago

换成模型并行方式全量微调的话:python -m llm_sft --model_type qwen_72b_chat --sft_type full --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data//user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true

会出现以下问题: Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

return _run_code(code, main_globals, None,

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

result = llm_x(args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 295, in llm_sft

trainer.train(training_args.resume_from_checkpoint)

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/trainers/trainers.py", line 50, in train

super().train(*args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train

return inner_training_loop(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1917, in _inner_training_loop

self.optimizer.step()

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/optimizer.py", line 145, in step

self.optimizer.step(closure)

File "/opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper

return wrapped(*args, **kwargs)

File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper

out = func(*args, **kwargs)

File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 76, in _use_grad

ret = func(self, *args, **kwargs)

File "/opt/conda/lib/python3.10/site-packages/torch/optim/adamw.py", line 184, in step

adamw(

File "/opt/conda/lib/python3.10/site-packages/torch/optim/adamw.py", line 335, in adamw

func(

File "/opt/conda/lib/python3.10/site-packages/torch/optim/adamw.py", line 599, in _multi_tensor_adamw

exp_avg_sq_sqrt = torch._foreach_sqrt(device_exp_avg_sqs)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 1 has a total capacty of 79.32 GiB of which 275.56 MiB is free. Process 889798 has 79.05 GiB memory in use. Of the allocated memory 77.65 GiB is allocated by PyTorch, and 24.46 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Jintao-Huang commented 7 months ago

72b 8卡A100没办法跑 --sft_type full的

uRENu commented 7 months ago

lora和全参数的区别吧. 你直接跑这个脚本会OOM嘛. 你可能需要安装一下flash_attn

安装了flash_attn,参考[https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_72b_chat/lora_mp_ddp/sft.sh,微调qwen_72b_chat依然会在get_model_tokenizer(args.model_type](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_72b_chat/lora_mp_ddp/sft.sh%EF%BC%8C%E5%BE%AE%E8%B0%83qwen_72b_chat%E4%BE%9D%E7%84%B6%E4%BC%9A%E5%9C%A8get_model_tokenizer(args.model_type), args.torch_dtype,model_kwargs, *kwargs)时cuda OOM,确认use_flash_attn=true。 我看您提供的示例中用的环境是 4 A100 # 4 * 75GB GPU memory,我的环境是8_A800 # 8_80GB GPU memory

我的命令行如下: torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type lora --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules DEFAULT --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data//user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /local/apps/zai-model/model_llm_sft/nlp_v2/ds_config/zero2.json

报错如下: [INFO:swift] Global seed set to 42

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

WARNING:transformers_modules.Qwen-72B-Chat.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/19 [00:00<?, ?it/s] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.78s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:26, 14.79s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.76s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.10s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.10s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:25, 14.76s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:15<04:31, 15.11s/it] Loading checkpoint shards: 5%|▌ | 1/19 [00:14<04:26, 14.83s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:32<04:39, 16.44s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:48, 16.97s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:46, 16.87s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:32<04:45, 16.80s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:45, 16.81s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:46, 16.86s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:34<04:57, 17.50s/it] Loading checkpoint shards: 11%|█ | 2/19 [00:33<04:49, 17.00s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:48<04:22, 16.42s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.59s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.67s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.57s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:50<04:31, 16.95s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.65s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:25, 16.62s/it] Loading checkpoint shards: 16%|█▌ | 3/19 [00:49<04:26, 16.68s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:06<04:11, 16.76s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.07s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.08s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:15, 17.06s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:08<04:18, 17.26s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:19, 17.31s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.08s/it] Loading checkpoint shards: 21%|██ | 4/19 [01:07<04:16, 17.13s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:54, 16.75s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.69s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:53, 16.71s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:54, 16.72s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:24<03:55, 16.83s/it] Loading checkpoint shards: 26%|██▋ | 5/19 [01:23<03:56, 16.87s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:40<03:41, 17.01s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.48s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.50s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.50s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:48, 17.58s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:43<03:48, 17.59s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:47, 17.53s/it] Loading checkpoint shards: 32%|███▏ | 6/19 [01:42<03:49, 17.69s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:54<03:12, 16.00s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.23s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.24s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:55<03:14, 16.18s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:14, 16.19s/it] Loading checkpoint shards: 37%|███▋ | 7/19 [01:56<03:15, 16.25s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:11<02:59, 16.28s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.33s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.29s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.30s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.35s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.31s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:13<02:59, 16.35s/it] Loading checkpoint shards: 42%|████▏ | 8/19 [02:12<02:59, 16.33s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:27<02:40, 16.09s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.29s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.31s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.29s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:42, 16.28s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.32s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:29<02:43, 16.33s/it] Loading checkpoint shards: 47%|████▋ | 9/19 [02:28<02:43, 16.31s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.69s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:42<02:21, 15.70s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.71s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:43<02:21, 15.76s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.24s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.24s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it]

Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.25s/it]

Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

Traceback (most recent call last):

Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it] Loading checkpoint shards: 53%|█████▎ | 10/19 [02:53<02:36, 17.35s/it]

Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.28s/it]

Loading checkpoint shards: 53%|█████▎ | 10/19 [02:52<02:35, 17.25s/it]

return _run_code(code, main_globals, None,

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

return _run_code(code, main_globals, None,

return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

Traceback (most recent call last):

return _run_code(code, main_globals, None,Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

Traceback (most recent call last):

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

Traceback (most recent call last):

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

exec(code, run_globals)

exec(code, run_globals) File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

result = llm_x(args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

result = llm_x(args, **kwargs)

result = llm_x(args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

result = llm_x(args, **kwargs)return _run_code(code, main_globals, None,

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

return _run_code(code, main_globals, None,

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,return _run_code(code, main_globals, None,

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype, File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

return _run_code(code, main_globals, None, File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

exec(code, run_globals)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 324, in

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

result = llm_x(args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

sft_main()

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main

result = llm_x(args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

result = llm_x(args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

result = llm_x(args, **kwargs)

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 71, in llm_sft

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

model, tokenizer = get_model_tokenizer(args.model_type, args.torch_dtype,

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2200, in get_model_tokenizer

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

model, tokenizer = get_model_tokenizer_qwen(*args, kwargs)model, tokenizer = get_model_tokenizer_qwen(*args, *kwargs)model, tokenizer = get_model_tokenizer_qwen(args, kwargs)model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

model, tokenizer = get_model_tokenizer_from_repo(

model, tokenizer = get_model_tokenizer_from_repo(model, tokenizer = get_model_tokenizer_from_repo( File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model, tokenizer = get_model_tokenizer_from_repo( File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,model = automodel_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model = automodel_class.from_pretrained( File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model = automodel_class.from_pretrained(

model = automodel_class.from_pretrained( File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs,

model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

module_obj = module_class.from_pretrained(model_dir, model_args,module_obj = module_class.from_pretrained(model_dir, model_args,model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 166, in get_model_tokenizer_qwen_chat

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

module_obj = module_class.from_pretrained(model_dir, model_args,module_obj = module_class.from_pretrained(model_dir, model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

model, tokenizer = get_model_tokenizer_from_repo(model, tokenizer = get_model_tokenizer_qwen(*args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 142, in get_model_tokenizer_qwen

model, tokenizer = get_model_tokenizer_from_repo(

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model, tokenizer = get_model_tokenizer_from_repo(

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model, tokenizer = get_model_tokenizer_from_repo(

File "/home/jeeves/.local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 400, in get_model_tokenizer_from_repo

model = automodel_class.from_pretrained(return model_class.from_pretrained(

return model_class.from_pretrained( return model_class.from_pretrained( File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

model = automodel_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model = automodel_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

model = automodel_class.from_pretrained(return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 111, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, kwargs)return ori_from_pretrained(cls, model_dir, *model_args, *kwargs)return ori_from_pretrained(cls, model_dir, model_args, kwargs)

module_obj = module_class.from_pretrained(model_dir, *model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

module_obj = module_class.from_pretrained(model_dir, *model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

module_obj = module_class.from_pretrained(model_dir, *model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

module_obj = module_class.from_pretrained(model_dir, *model_args,

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return model_class.from_pretrained(

File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained

) = cls._load_pretrained_model() = cls._load_pretrained_model() = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

) = cls._load_pretrained_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(

File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs)set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device

new_value = value.to(device)new_value = value.to(device)

new_value = value.to(device) new_value = value.to(device)new_value = value.to(device)

new_value = value.to(device)new_value = value.to(device)

torch.cudatorch.cudatorch.cudanew_value = value.to(device).

torch.cudatorch.cudatorch.cudaOutOfMemoryErrortorch.cuda......: OutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorOutOfMemoryErrorCUDA out of memory. Tried to allocate 384.00 MiB. GPU 6 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832122 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFtorch.cuda: : : .: : CUDA out of memory. Tried to allocate 384.00 MiB. GPU 1 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832117 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 5 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832121 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacty of 79.32 GiB of which 295.56 MiB is free. Process 2832116 has 79.03 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF:

OutOfMemoryError

CUDA out of memory. Tried to allocate 384.00 MiB. GPU 3 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832119 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

CUDA out of memory. Tried to allocate 384.00 MiB. GPU 4 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832120 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 384.00 MiB. GPU 7 has a total capacty of 79.32 GiB of which 295.56 MiB is free. Process 2832123 has 79.03 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF:

CUDA out of memory. Tried to allocate 384.00 MiB. GPU 2 has a total capacty of 79.32 GiB of which 199.56 MiB is free. Process 2832118 has 79.13 GiB memory in use. Of the allocated memory 77.57 GiB is allocated by PyTorch, and 336.00 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

这个应该8卡A800能跑吧,-model_type qwen_72b_chat --sft_type lora

Jintao-Huang commented 7 months ago

torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type lora --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules ALL --batch_size 1 --weight_decay 0.01 --learning_rate 1e-4 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data/user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed default-zero3

uRENu commented 6 months ago

torchrun --master_addr localhost --master_port 23456 --node_rank 0 --nnodes 1 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type lora --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules ALL --batch_size 1 --weight_decay 0.01 --learning_rate 1e-4 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /mnt/data/user/tc_ai/data/zai-model/Model/huggingface/Qwen-72B-Chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed default-zero3

针对单机8卡的情况,将 --nproc_per_node 8修改为--nproc_per_node 2,实现了ddp+mp,可以微调起来

uRENu commented 6 months ago

72b 8卡A100没办法跑 --sft_type full的

我现在试着用2机16卡(每台机器都有8张A800)去跑full,通过ddp+mp的方式还是会有OOM的问题,但是看机器显存情况应该是在微调中反向传播参数没有均匀加载在多张卡上(看上去只在2张卡上加载了)导致显存爆了,我的运行代码如下: torchrun --master_port 23456 --node_rank 1 --nnodes 2 --nproc_per_node 2 -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type full --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 1024 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /models/qwen_72b_chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true

报错: File "/home/jeeves/.local/lib/python3.10/site-packages/swift/trainers/trainers.py", line 50, in train super().train(*args, kwargs) File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1869, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2781, in training_step self.accelerator.backward(loss) File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1966, in backward loss.backward(kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB. GPU 6 has a total capacty of 79.32 GiB of which 23.56 MiB is free. Process 1276558 has 79.30 GiB memory in use. Of the allocated memory 77.52 GiB is allocated by PyTorch, and 178.04 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

orch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 7 has a total capacty of 79.32 GiB of which 177.56 MiB is free. Process 1276559 has 79.15 GiB memory in use. Of the allocated memory 77.29 GiB is allocated by PyTorch, and 339.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

uRENu commented 6 months ago

我还尝试了用2机16卡(每台机器都有8张A800)去跑full,通过mp的方式,我的运行代码如下: python -m model_llm_sft.nlp_v2.llm_sft --model_type qwen_72b_chat --sft_type full --tuner_backend swift --template_type AUTO --output_dir /local/data/model_train_1285/models --ddp_backend nccl --custom_train_dataset_path /local/data/data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 1024 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 1 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /models/qwen_72b_chat --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true

但是遇到如下报错,: Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 368, in sft_main() File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main result = llm_x(args, *kwargs) File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 320, in llm_sft trainer.train(training_args.resume_from_checkpoint) File "/home/jeeves/.local/lib/python3.10/site-packages/swift/trainers/trainers.py", line 50, in train super().train(args, **kwargs) File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1687, in _inner_training_loop model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1179, in prepare raise ValueError( ValueError: You can't train a model that has been loaded with device_map='auto' in any distributed mode. Please rerun your script specifying --num_processes=1 or by launching with python {{myscript.py}}. Training failed, please check log

Jintao-Huang commented 6 months ago

swift现在应该没有办法支持全参数微调72b的模型

uRENu commented 6 months ago

swift现在应该没有办法支持全参数微调72b的模型

swift现在是否支持多机多卡的模型并行呢?

Jintao-Huang commented 6 months ago

你拉取一下最新的代码, 尝试使用zero3+多机的方式进行全参数训练.

Jintao-Huang commented 6 months ago

多机的使用可以先参考这里: https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.md#%E4%BD%BF%E7%94%A8cli

uRENu commented 6 months ago

多机的使用可以先参考这里: https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.md#%E4%BD%BF%E7%94%A8cli

多机通过ddp的方式很容易OOM,以下是我的运行代码和zero3配置: 机器1 torchrun --master --node_rank 0 --nnodes 3 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type miqu_70B --sft_type full --tuner_backend swift --template_type AUTO --output_dir /models --ddp_backend nccl --custom_train_dataset_path /data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 4 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /miqu-1-70b-sf --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /ds_config/zero3.json

机器2 torchrun --master --node_rank 1 --nnodes 3 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type miqu_70B --sft_type full --tuner_backend swift --template_type AUTO --output_dir /models --ddp_backend nccl --custom_train_dataset_path /data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 4 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /miqu-1-70b-sf --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /ds_config/zero3.json

机器3 torchrun --master --node_rank 2 --nnodes 3 --nproc_per_node 8 -m model_llm_sft.nlp_v2.llm_sft --model_type miqu_70B --sft_type full --tuner_backend swift --template_type AUTO --output_dir /models --ddp_backend nccl --custom_train_dataset_path /data_train_1285/processed_data/train/train.jsonl --train_dataset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy warning --gradient_checkpointing true --batch_size 4 --weight_decay 0.01 --learning_rate 1e-05 --gradient_accumulation_steps 4 --max_grad_norm 1.0 --warmup_ratio 0.03 --model_cache_dir /miqu-1-70b-sf --eval_steps 50 --save_steps 50 --save_total_limit 2 --use_flash_attn true --logging_steps 1 --push_to_hub false --only_save_model true --ignore_args_error true --save_on_each_node false --disable_tqdm true --deepspeed_config_path /ds_config/zero3.json

zero3.json: { "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": "auto" }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": "auto", "warmup_max_lr": "auto", "warmup_num_steps": "auto" } }, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "offload_param": { "device": "cpu", "pin_memory": true }, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1e8, "reduce_bucket_size": 1e7, "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1e5, "stage3_max_reuse_distance": 1e5, "stage3_gather_16bit_weights_on_model_save": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 2000, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }

报错: model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, File "/local/apps/zai-model/model_llm_sft/nlp_v2/custom.py", line 141, in get_model_tokenizer_miqu set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs) File "/home/jeeves/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 384, in set_module_tensor_to_device model = LlamaForCausalLM.from_pretrained(model_dir, config=config, torch_dtype=torch_dtype, trust_remote_code=True, model_kwargs) File "/home/jeeves/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 74, in from_pretrained return ori_from_pretrained(cls, model_dir, *model_args, kwargs)sft_main() File "/home/jeeves/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained File "/home/jeeves/.local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 31, in x_main result = llm_x(args, kwargs)new_value = value.to(device) File "/local/apps/zai-model/model_llm_sft/nlp_v2/llm_sft.py", line 80, in llm_sft torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 6 has a total capacty of 79.32 GiB of which 87.56 MiB is free. Process 586306 has 79.24 GiB memory in use. Of the allocated memory 77.43 GiB is allocated by PyTorch, and 495.50 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Jintao-Huang commented 6 months ago

batch_size 设置为1

Jintao-Huang commented 6 months ago

--deepspeed default-zero3 不要offload到cpu

uRENu commented 6 months ago

--deepspeed default-zero3 不要offload到cpu

修改后也是会有相同的cuda OOM ,都是在加载模型 .from_pretrained()时CUDA OOM

Jintao-Huang commented 6 months ago

你拉一下最新的swift main分支

photonchen commented 6 months ago

双机16卡A800,跑QWen 72B全量微调,跑通了吗

Xu-Chen commented 6 months ago

模型并行训练,全参数的话,还是需要megatron-deepspeed类似的并行框架,transformer自带的mp,推理还行,训练有问题

Jintao-Huang commented 6 months ago

我感觉 zero3多机应该也是可以的

Jintao-Huang commented 6 months ago

megatron 下个版本会接入

Xu-Chen commented 6 months ago

我感觉 zero3多机应该也是可以的

zero3 会用很多内存,但是一般单机8卡的话,一个机器也就 900GB 内存,会卡住,最后内存 OOM

Xu-Chen commented 6 months ago

megatron 下个版本会接入

有了 megatron ,就可以支持分布式预训练了,不止sft 了,期待加入

uRENu commented 6 months ago

双机16卡A800,跑QWen 72B全量微调,跑通了吗

双机16卡没跑通,三机24卡A800才跑通

Jintao-Huang commented 6 months ago

好哒~

photonchen commented 6 months ago

双机16卡A800,跑QWen 72B全量微调,跑通了吗

双击没跑通,三机24卡A800才跑通

@uRENu 如何配置的能分享一下吗?需要offload到内存吗?

uRENu commented 6 months ago

--deepspeed default-zero3就可以

uRENu commented 5 months ago

好哒~

三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了

以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path)

在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading

在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192])

我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件

uRENu commented 5 months ago

好哒~

三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了

以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path)

在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading

在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192])

我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件

直接用trainer.state.best_model_checkpoint来作为模型训练完的保存文件似乎可以

ultrazhl98 commented 3 months ago

好哒~

三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了 以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path) 在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading 在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192]) 我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件

直接用trainer.state.best_model_checkpoint来作为模型训练完的保存文件似乎可以

我在使用4节点,32张卡全量微调qwen-vl也出现了模型权重保存缺失的问题,有些checkpoints正常,有些缺少,请问你这个问题怎么解决的

uRENu commented 3 months ago

好哒~

三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了 以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path) 在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading 在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192]) 我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件

直接用trainer.state.best_model_checkpoint来作为模型训练完的保存文件似乎可以

我在使用4节点,32张卡全量微调qwen-vl也出现了模型权重保存缺失的问题,有些checkpoints正常,有些缺少,请问你这个问题怎么解决的

好哒~

三机24卡A800,--deepspeed default-zero3配置下微调的70B模型在模型保存和加载时遇到问题了 以下是模型保存代码: from swift.utils import is_master if is_master(): model.save_pretrained(save_model_path, max_shard_size="5GB", safe_serialization=True) tokenizer.save_pretrained(save_model_path) 在保存模型时会遇到问题Removed shared tensor : [INFO:swift] last_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 [INFO:swift] best_model_checkpoint: /local/checkpoints/model_train_2171/models/miqu_70B/v0-20240408-213452/checkpoint-49 Removed shared tensor {'model.layers.74.mlp.up_proj.weight', 'model.layers.50.self_attn.q_proj.weight', 'model.layers.69.mlp.up_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.57.self_attn.q_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.63.mlp.down_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.55.mlp.gate_proj.weight', 'model.layers.54.self_attn.v_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.73.self_attn.k_proj.weight', 'model.layers.68.mlp.down_proj.weight', 'model.layers.61.mlp.down_proj.weight', 'model.layers.73.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.57.mlp.down_proj.weight', 'model.layers.79.mlp.up_proj.weight', 'model.layers.76.self_attn.q_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.60.mlp.down_proj.weight', 'model.layers.40.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.51.mlp.gate_proj.weight', 'model.layers.41.mlp.up_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.53.self_attn.o_proj.weight', 'model.layers.41.self_attn.o_proj.weight', 'model.layers.63.mlp.up_proj.weight', 'model.layers.53.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.50.self_attn.o_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.50.mlp.down_proj.weight', 'model.layers.62.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.37.mlp.gate_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.48.mlp.gate_proj.weight', 'model.layers.69.mlp.down_proj.weight', 'model.layers.76.self_attn.o_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.59.self_attn.q_proj.weight', 'model.layers.63.self_attn.o_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.42.mlp.gate_proj.weight', 'model.layers.45.mlp.gate_proj.weight', 'model.layers.53.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.64.mlp.gate_proj.weight', 'model.layers.68.self_attn.k_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.43.self_attn.k_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.69.self_attn.o_proj.weight', 'model.layers.53.mlp.up_proj.weight', 'model.layers.52.mlp.down_proj.weight', 'model.layers.54.mlp.up_proj.weight', 'model.layers.61.self_attn.q_proj.weight', 'model.layers.79.self_attn.o_proj.weight', 'model.layers.41.self_attn.q_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.69.self_attn.q_proj.weight', 'model.layers.59.mlp.up_proj.weight', 'model.layers.67.mlp.up_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.52.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.40.mlp.down_proj.weight', 'model.layers.58.mlp.gate_proj.weight', 'model.layers.52.self_attn.v_proj.weight', 'model.layers.58.mlp.down_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.63.self_attn.v_proj.weight', 'model.layers.67.mlp.gate_proj.weight', 'model.layers.66.mlp.up_proj.weight', 'model.layers.57.self_attn.v_proj.weight', 'model.layers.49.mlp.up_proj.weight', 'model.layers.49.self_attn.q_proj.weight', 'model.layers.77.mlp.down_proj.weight', 'model.layers.68.mlp.gate_proj.weight', 'model.layers.48.mlp.up_proj.weight', 'model.layers.78.self_attn.o_proj.weight', 'model.layers.61.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.44.self_attn.k_proj.weight', 'model.layers.75.self_attn.q_proj.weight', 'model.layers.40.mlp.up_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.55.mlp.down_proj.weight', 'model.layers.72.self_attn.k_proj.weight', 'model.layers.76.self_attn.k_proj.weight', 'model.layers.55.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.56.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.67.self_attn.q_proj.weight', 'model.layers.70.self_attn.o_proj.weight', 'model.layers.71.self_attn.o_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.70.self_attn.q_proj.weight', 'model.layers.73.mlp.down_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.74.self_attn.q_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.73.mlp.up_proj.weight', 'model.layers.40.mlp.gate_proj.weight', 'model.layers.64.self_attn.k_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.58.self_attn.v_proj.weight', 'model.layers.67.mlp.down_proj.weight', 'model.layers.41.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.48.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.43.self_attn.q_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.76.mlp.gate_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.46.self_attn.v_proj.weight', 'model.layers.49.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.44.self_attn.q_proj.weight', 'model.layers.73.self_attn.q_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.69.self_attn.v_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.78.mlp.down_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.51.self_attn.k_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.71.mlp.up_proj.weight', 'model.layers.72.mlp.up_proj.weight', 'model.layers.64.self_attn.o_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.61.mlp.up_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.58.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.55.self_attn.v_proj.weight', 'model.layers.57.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.55.self_attn.o_proj.weight', 'model.layers.71.self_attn.k_proj.weight', 'model.layers.46.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.44.self_attn.o_proj.weight', 'model.layers.69.mlp.gate_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.40.self_attn.o_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.54.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.74.self_attn.o_proj.weight', 'model.layers.45.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.62.mlp.up_proj.weight', 'model.layers.62.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.52.mlp.gate_proj.weight', 'model.layers.66.mlp.gate_proj.weight', 'model.layers.71.mlp.down_proj.weight', 'model.layers.45.mlp.up_proj.weight', 'model.layers.52.mlp.up_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.72.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.51.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.65.mlp.down_proj.weight', 'model.layers.64.mlp.down_proj.weight', 'model.layers.73.mlp.gate_proj.weight', 'model.layers.66.self_attn.o_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.60.mlp.up_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.79.mlp.gate_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.60.self_attn.q_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.44.mlp.down_proj.weight', 'model.layers.56.self_attn.k_proj.weight', 'model.layers.70.mlp.up_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.67.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.44.self_attn.v_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.46.mlp.up_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.78.self_attn.v_proj.weight', 'model.layers.47.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.45.self_attn.q_proj.weight', 'model.layers.54.self_attn.k_proj.weight', 'model.layers.62.self_attn.o_proj.weight', 'model.layers.68.mlp.up_proj.weight', 'model.layers.46.self_attn.k_proj.weight', 'model.layers.48.self_attn.v_proj.weight', 'model.layers.61.mlp.gate_proj.weight', 'model.layers.40.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.64.mlp.up_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.65.self_attn.k_proj.weight', 'model.layers.70.self_attn.v_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.65.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.43.mlp.gate_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.74.self_attn.v_proj.weight', 'model.layers.77.self_attn.v_proj.weight', 'model.layers.75.mlp.up_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.53.self_attn.k_proj.weight', 'model.layers.57.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.72.mlp.gate_proj.weight', 'model.layers.43.mlp.down_proj.weight', 'model.layers.45.self_attn.v_proj.weight', 'model.layers.63.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.53.self_attn.v_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.65.self_attn.q_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.66.self_attn.v_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.39.mlp.up_proj.weight', 'model.layers.71.self_attn.v_proj.weight', 'model.layers.78.self_attn.k_proj.weight', 'model.layers.78.mlp.gate_proj.weight', 'model.layers.56.mlp.down_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.75.self_attn.o_proj.weight', 'model.layers.63.self_attn.q_proj.weight', 'model.layers.60.mlp.gate_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.52.self_attn.o_proj.weight', 'model.layers.74.mlp.down_proj.weight', 'model.layers.59.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_proj.weight', 'model.layers.77.self_attn.o_proj.weight', 'model.layers.56.self_attn.v_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.74.self_attn.k_proj.weight', 'model.layers.76.self_attn.v_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.65.mlp.gate_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.77.mlp.up_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.57.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.76.mlp.down_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.66.mlp.down_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.64.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.75.self_attn.v_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.55.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.59.mlp.gate_proj.weight', 'model.layers.61.self_attn.o_proj.weight', 'model.layers.44.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.50.self_attn.v_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.65.mlp.up_proj.weight', 'model.layers.65.self_attn.o_proj.weight', 'model.layers.42.self_attn.q_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.42.mlp.up_proj.weight', 'model.layers.53.mlp.down_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.72.mlp.down_proj.weight', 'model.layers.79.self_attn.k_proj.weight', 'model.layers.41.mlp.gate_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.58.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.43.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.47.mlp.up_proj.weight', 'model.layers.60.self_attn.o_proj.weight', 'model.layers.61.self_attn.k_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.50.self_attn.k_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.62.mlp.down_proj.weight', 'model.layers.77.self_attn.q_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.68.self_attn.q_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.44.mlp.up_proj.weight', 'model.layers.51.mlp.up_proj.weight', 'model.layers.47.self_attn.v_proj.weight', 'model.layers.73.self_attn.v_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.40.self_attn.q_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.79.mlp.down_proj.weight', 'model.layers.52.self_attn.q_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.51.mlp.down_proj.weight', 'model.layers.75.mlp.gate_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.71.self_attn.q_proj.weight', 'model.layers.60.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.78.self_attn.q_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.67.self_attn.v_proj.weight', 'model.layers.41.mlp.down_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.63.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.66.self_attn.k_proj.weight', 'model.layers.50.mlp.up_proj.weight', 'model.layers.43.self_attn.o_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.54.self_attn.o_proj.weight', 'model.layers.54.mlp.down_proj.weight', 'model.layers.62.self_attn.k_proj.weight', 'model.layers.62.mlp.gate_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.51.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.78.mlp.up_proj.weight', 'model.layers.48.self_attn.q_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.79.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.70.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.76.mlp.up_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.72.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.72.self_attn.v_proj.weight', 'model.layers.71.mlp.gate_proj.weight', 'model.layers.77.self_attn.k_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.50.mlp.gate_proj.weight', 'model.layers.59.self_attn.v_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.79.self_attn.v_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.77.mlp.gate_proj.weight', 'model.layers.66.self_attn.q_proj.weight', 'model.layers.55.self_attn.q_proj.weight', 'model.layers.51.self_attn.v_proj.weight', 'model.layers.70.self_attn.k_proj.weight', 'model.layers.69.self_attn.k_proj.weight', 'model.layers.68.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.74.mlp.gate_proj.weight', 'model.layers.57.self_attn.o_proj.weight', 'model.layers.68.self_attn.o_proj.weight', 'model.layers.46.mlp.gate_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.59.mlp.down_proj.weight', 'model.layers.75.mlp.down_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.70.mlp.down_proj.weight', 'model.layers.58.mlp.up_proj.weight', 'model.layers.59.self_attn.k_proj.weight', 'model.layers.42.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.43.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.60.self_attn.v_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.56.mlp.gate_proj.weight', 'model.layers.56.mlp.up_proj.weight', 'model.layers.58.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.42.self_attn.o_proj.weight', 'model.layers.42.self_attn.k_proj.weight', 'model.layers.67.self_attn.k_proj.weight', 'model.layers.54.self_attn.q_proj.weight', 'model.layers.49.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.75.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.49.mlp.gate_proj.weight', 'model.layers.41.self_attn.v_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.64.self_attn.q_proj.weight', 'model.layers.42.self_attn.v_proj.weight', 'model.layers.56.self_attn.q_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.47.self_attn.q_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading 在模型加载时LlamaForCausalLM.from_pretrained(save_model_path) ,会报错 size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32000, 8192]) 我试过了保存模型时将safe_serialization=False,但是依然在保存模型时无法保存全部文件

直接用trainer.state.best_model_checkpoint来作为模型训练完的保存文件似乎可以

我在使用4节点,32张卡全量微调qwen-vl也出现了模型权重保存缺失的问题,有些checkpoints正常,有些缺少,请问你这个问题怎么解决的

设置training_args.save_only_model = False