Closed longkeyy closed 1 year ago
模型训练与精调
Alpaca-2-7B
Linux
lora_rank=64 lora_alpha=128 lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj" modules_to_save="embed_tokens,lm_head" lora_dropout=0.05 pretrained_model=/opt/base_models/chinese-alpaca-2-7b-hf/ chinese_tokenizer_path=/opt/base_models/chinese-alpaca-2-7b-hf/ dataset_dir=/hy-tmp/data/pt output_dir=/hy-tmp/output/pt peft_model=path/to/peft/model/dir data_cache=/hy-tmp/data/pt_tmp per_device_train_batch_size=4 per_device_eval_batch_size=4 gradient_accumulation_steps=8 deepspeed_config_file=ds_zero2_no_offload.json torchrun --nnodes 1 --nproc_per_node 1 run_clm_pt_with_peft.py \ --deepspeed ${deepspeed_config_file} \ --model_name_or_path ${pretrained_model} \ --tokenizer_name_or_path ${chinese_tokenizer_path} \ --dataset_dir ${dataset_dir} \ --data_cache_dir ${data_cache} \ --validation_split_percentage 0.001 \ --per_device_train_batch_size ${per_device_train_batch_size} \ --per_device_eval_batch_size ${per_device_eval_batch_size} \ --do_train \ --seed $RANDOM \ --fp16 \ --num_train_epochs 1 \ --lr_scheduler_type cosine \ --learning_rate ${lr} \ --warmup_ratio 0.05 \ --weight_decay 0.01 \ --logging_strategy steps \ --logging_steps 10 \ --save_strategy steps \ --save_total_limit 1 \ --save_steps 50 \ --gradient_accumulation_steps ${gradient_accumulation_steps} \ --preprocessing_num_workers 96 \ --block_size 4096 \ --output_dir ${output_dir} \ --ddp_timeout 30000 \ --logging_first_step True \ --lora_rank ${lora_rank} \ --lora_alpha ${lora_alpha} \ --trainable ${lora_trainable} \ --modules_to_save ${modules_to_save} \ --lora_dropout ${lora_dropout} \ --torch_dtype float16 \ --gradient_checkpointing \ --ddp_find_unused_parameters False \ --flash_attn \ --resume_from_checkpoint /hy-tmp/result/pt/checkpoint-156
absl-py==1.4.0 accelerate==0.21.0 aiohttp==3.8.5 aiosignal==1.3.1 anyio==3.7.0 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 asttokens==2.2.1 async-lru==2.0.2 async-timeout==4.0.3 attrs==23.1.0 Babel==2.12.1 backcall==0.2.0 beautifulsoup4==4.12.2 bitsandbytes==0.41.0 bleach==6.0.0 boltons @ file:///croot/boltons_1677628692245/work cachetools==5.3.1 certifi @ file:///croot/certifi_1683875369620/work/certifi cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work cmake==3.25.0 comm==0.1.3 conda @ file:///croot/conda_1685025188142/work conda-content-trust @ file:///tmp/abs_5952f1c8-355c-4855-ad2e-538535021ba5h26t22e5/croots/recipe/conda-content-trust_1658126371814/work conda-package-handling @ file:///croot/conda-package-handling_1666940373510/work contourpy==1.0.7 cryptography @ file:///croot/cryptography_1677533068310/work cycler==0.11.0 daal==2023.1.1 daal4py==2023.1.1 datasets==2.14.4 debugpy==1.6.7 decorator==5.1.1 deepspeed==0.10.1 defusedxml==0.7.1 dill==0.3.7 einops==0.6.1 exceptiongroup==1.1.1 executing==1.2.0 fastjsonschema==2.17.1 filelock==3.9.0 flash-attn @ file:///hy-tmp/flash_attn-2.0.8%2Bcu117torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl fonttools==4.39.4 fqdn==1.5.1 frozenlist==1.4.0 fsspec==2023.6.0 google-auth==2.19.0 google-auth-oauthlib==1.0.0 grpcio==1.54.2 hjson==3.1.0 huggingface-hub==0.16.4 idna @ file:///croot/idna_1666125576474/work importlib-metadata==6.6.0 importlib-resources==5.12.0 ipykernel==6.23.1 ipython==8.12.2 ipython-genutils==0.2.0 isoduration==20.11.0 jedi==0.18.2 Jinja2==3.1.2 joblib==1.2.0 json5==0.9.14 jsonpatch @ file:///tmp/build/80754af9/jsonpatch_1615747632069/work jsonpointer==2.1 jsonschema==4.17.3 jupyter-events==0.6.3 jupyter-lsp==2.2.0 jupyter_client==8.2.0 jupyter_core==5.3.0 jupyter_server==2.6.0 jupyter_server_terminals==0.4.4 jupyterlab==4.0.0 jupyterlab-language-pack-zh-CN==4.0.post0 jupyterlab-pygments==0.2.2 jupyterlab_server==2.22.1 keras==2.12.0 kiwisolver==1.4.4 lit==15.0.7 Markdown==3.4.3 MarkupSafe==2.1.2 matplotlib==3.7.1 matplotlib-inline==0.1.6 mistune==2.0.5 mpmath==1.2.1 multidict==6.0.4 multiprocess==0.70.15 nbclassic==1.0.0 nbclient==0.8.0 nbconvert==7.4.0 nbformat==5.8.0 nest-asyncio==1.5.6 networkx==3.0 ninja==1.11.1 notebook==6.5.4 notebook_shim==0.2.3 numpy==1.24.1 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-ml-py==11.525.112 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 nvitop==1.1.2 oauthlib==3.2.2 overrides==7.3.1 packaging @ file:///croot/packaging_1678965309396/work pandas==2.0.2 pandocfilters==1.5.0 parso==0.8.3 peft @ file:///hy-tmp/llama2/training/peft pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.3.0 pkgutil_resolve_name==1.3.10 platformdirs==3.5.1 pluggy @ file:///tmp/build/80754af9/pluggy_1648042571233/work prometheus-client==0.17.0 prompt-toolkit==3.0.38 protobuf==4.23.2 psutil==5.9.5 ptyprocess==0.7.0 pure-eval==0.2.2 py-cpuinfo==9.0.0 pyarrow==12.0.1 pyasn1==0.5.0 pyasn1-modules==0.3.0 pycosat @ file:///croot/pycosat_1666805502580/work pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pydantic==1.10.12 Pygments==2.15.1 pyOpenSSL @ file:///croot/pyopenssl_1677607685877/work pyparsing==3.0.9 pyrsistent==0.19.3 PySocks @ file:///tmp/build/80754af9/pysocks_1605305779399/work python-dateutil==2.8.2 python-json-logger==2.0.7 pytz==2023.3 PyYAML==6.0 pyzmq==25.1.0 regex==2023.8.8 requests @ file:///croot/requests_1682607517574/work requests-oauthlib==1.3.1 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rsa==4.9 ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work safetensors==0.3.2 scikit-learn==1.3.0 scikit-learn-intelex==2023.1.1 scipy==1.10.1 Send2Trash==1.8.2 sentencepiece==0.1.97 six @ file:///tmp/build/80754af9/six_1644875935023/work sklearn==0.0.post5 sniffio==1.3.0 soupsieve==2.4.1 stack-data==0.6.2 sympy==1.11.1 tbb==2021.9.0 tensorboard==2.13.0 tensorboard-data-server==0.7.0 termcolor==2.3.0 terminado==0.17.1 threadpoolctl==3.1.0 tinycss2==1.2.1 tokenizers==0.13.3 tomli==2.0.1 toolz @ file:///croot/toolz_1667464077321/work torch==2.0.1 torchaudio==2.0.1+cu117 torchvision==0.15.1+cu117 tornado==6.3.2 tqdm @ file:///croot/tqdm_1679561862951/work traitlets==5.9.0 transformers==4.31.0 triton==2.0.0 typing_extensions==4.4.0 tzdata==2023.3 uri-template==1.2.0 urllib3==1.25.8 wcwidth==0.2.6 webcolors==1.13 webencodings==0.5.1 websocket-client==1.5.2 Werkzeug==2.3.4 xgboost==1.7.5 xxhash==3.3.0 yarl==1.9.2 zipp==3.15.0
[INFO|deepspeed.py:381] 2023-08-21 19:48:45,175 >> Attempting to resume from /hy-tmp/result/pt/checkpoint-156 [2023-08-21 19:48:45,176] [INFO] [torch_checkpoint_engine.py:27:load] [Torch] Loading checkpoint from /hy-tmp/result/pt/checkpoint-156/global_step156/mp_rank_00_model_states.pt... [2023-08-21 19:48:51,071] [INFO] [torch_checkpoint_engine.py:29:load] [Torch] Loaded checkpoint from /hy-tmp/result/pt/checkpoint-156/global_step156/mp_rank_00_model_states.pt. [2023-08-21 19:48:51,646] [INFO] [torch_checkpoint_engine.py:27:load] [Torch] Loading checkpoint from /hy-tmp/result/pt/checkpoint-156/global_step156/mp_rank_00_model_states.pt... [2023-08-21 19:48:57,443] [INFO] [torch_checkpoint_engine.py:29:load] [Torch] Loaded checkpoint from /hy-tmp/result/pt/checkpoint-156/global_step156/mp_rank_00_model_states.pt. Traceback (most recent call last): File "run_clm_pt_with_peft.py", line 637, in <module> main() File "run_clm_pt_with_peft.py", line 605, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/usr/local/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/usr/local/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1676, in _inner_training_loop deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint) File "/usr/local/miniconda3/lib/python3.8/site-packages/transformers/deepspeed.py", line 383, in deepspeed_load_checkpoint load_path, _ = deepspeed_engine.load_checkpoint( File "/usr/local/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2648, in load_checkpoint load_path, client_states = self._load_checkpoint(load_dir, File "/usr/local/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2707, in _load_checkpoint self.load_module_state_dict(checkpoint=checkpoint, File "/usr/local/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2511, in load_module_state_dict self.module.load_state_dict( File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: Missing key(s) in state_dict: "base_model.model.model.layers.0.self_attn.q_proj.weight", "base_model.model.model.layers.0.self_attn.k_proj.weight", "base_model.model.model.layers.0.self_attn.v_proj.weight", "base_model.model.model.layers.0.self_attn.o_proj.weight", "base_model.model.model.layers.0.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.0.mlp.gate_proj.weight", "base_model.model.model.layers.0.mlp.up_proj.weight", "base_model.model.model.layers.0.mlp.down_proj.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.self_attn.q_proj.weight", "base_model.model.model.layers.1.self_attn.k_proj.weight", "base_model.model.model.layers.1.self_attn.v_proj.weight", "base_model.model.model.layers.1.self_attn.o_proj.weight", "base_model.model.model.layers.1.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.1.mlp.gate_proj.weight", "base_model.model.model.layers.1.mlp.up_proj.weight", "base_model.model.model.layers.1.mlp.down_proj.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.self_attn.q_proj.weight", "base_model.model.model.layers.2.self_attn.k_proj.weight", "base_model.model.model.layers.2.self_attn.v_proj.weight", "base_model.model.model.layers.2.self_attn.o_proj.weight", "base_model.model.model.layers.2.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.2.mlp.gate_proj.weight", "base_model.model.model.layers.2.mlp.up_proj.weight", "base_model.model.model.layers.2.mlp.down_proj.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.self_attn.q_proj.weight", "base_model.model.model.layers.3.self_attn.k_proj.weight", "base_model.model.model.layers.3.self_attn.v_proj.weight", "base_model.model.model.layers.3.self_attn.o_proj.weight", "base_model.model.model.layers.3.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.3.mlp.gate_proj.weight", "base_model.model.model.layers.3.mlp.up_proj.weight", "base_model.model.model.layers.3.mlp.down_proj.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.self_attn.q_proj.weight", "base_model.model.model.layers.4.self_attn.k_proj.weight", "base_model.model.model.layers.4.self_attn.v_proj.weight", "base_model.model.model.layers.4.self_attn.o_proj.weight", "base_model.model.model.layers.4.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.4.mlp.gate_proj.weight", "base_model.model.model.layers.4.mlp.up_proj.weight", "base_model.model.model.layers.4.mlp.down_proj.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.self_attn.q_proj.weight", "base_model.model.model.layers.5.self_attn.k_proj.weight", "base_model.model.model.layers.5.self_attn.v_proj.weight", "base_model.model.model.layers.5.self_attn.o_proj.weight", "base_model.model.model.layers.5.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.5.mlp.gate_proj.weight", "base_model.model.model.layers.5.mlp.up_proj.weight", "base_model.model.model.layers.5.mlp.down_proj.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.self_attn.q_proj.weight", "base_model.model.model.layers.6.self_attn.k_proj.weight", "base_model.model.model.layers.6.self_attn.v_proj.weight", "base_model.model.model.layers.6.self_attn.o_proj.weight", "base_model.model.model.layers.6.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.6.mlp.gate_proj.weight", "base_model.model.model.layers.6.mlp.up_proj.weight", "base_model.model.model.layers.6.mlp.down_proj.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.self_attn.q_proj.weight", "base_model.model.model.layers.7.self_attn.k_proj.weight", "base_model.model.model.layers.7.self_attn.v_proj.weight", "base_model.model.model.layers.7.self_attn.o_proj.weight", "base_model.model.model.layers.7.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.7.mlp.gate_proj.weight", "base_model.model.model.layers.7.mlp.up_proj.weight", "base_model.model.model.layers.7.mlp.down_proj.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.self_attn.q_proj.weight", "base_model.model.model.layers.8.self_attn.k_proj.weight", "base_model.model.model.layers.8.self_attn.v_proj.weight", "base_model.model.model.layers.8.self_attn.o_proj.weight", "base_model.model.model.layers.8.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.8.mlp.gate_proj.weight", "base_model.model.model.layers.8.mlp.up_proj.weight", "base_model.model.model.layers.8.mlp.down_proj.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.self_attn.q_proj.weight", "base_model.model.model.layers.9.self_attn.k_proj.weight", "base_model.model.model.layers.9.self_attn.v_proj.weight", "base_model.model.model.layers.9.self_attn.o_proj.weight", "base_model.model.model.layers.9.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.9.mlp.gate_proj.weight", "base_model.model.model.layers.9.mlp.up_proj.weight", "base_model.model.model.layers.9.mlp.down_proj.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.self_attn.q_proj.weight", "base_model.model.model.layers.10.self_attn.k_proj.weight", "base_model.model.model.layers.10.self_attn.v_proj.weight", "base_model.model.model.layers.10.self_attn.o_proj.weight", "base_model.model.model.layers.10.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.10.mlp.gate_proj.weight", "base_model.model.model.layers.10.mlp.up_proj.weight", "base_model.model.model.layers.10.mlp.down_proj.weight", "base_model.model.model.layers.10.input_layernorm.weight", "base_model.model.model.layers.10.post_attention_layernorm.weight", "base_model.model.model.layers.11.self_attn.q_proj.weight", "base_model.model.model.layers.11.self_attn.k_proj.weight", "base_model.model.model.layers.11.self_attn.v_proj.weight", "base_model.model.model.layers.11.self_attn.o_proj.weight", "base_model.model.model.layers.11.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.11.mlp.gate_proj.weight", "base_model.model.model.layers.11.mlp.up_proj.weight", "base_model.model.model.layers.11.mlp.down_proj.weight", "base_model.model.model.layers.11.input_layernorm.weight", "base_model.model.model.layers.11.post_attention_layernorm.weight", "base_model.model.model.layers.12.self_attn.q_proj.weight", "base_model.model.model.layers.12.self_attn.k_proj.weight", "base_model.model.model.layers.12.self_attn.v_proj.weight", "base_model.model.model.layers.12.self_attn.o_proj.weight", "base_model.model.model.layers.12.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.12.mlp.gate_proj.weight", "base_model.model.model.layers.12.mlp.up_proj.weight", "base_model.model.model.layers.12.mlp.down_proj.weight", "base_model.model.model.layers.12.input_layernorm.weight", "base_model.model.model.layers.12.post_attention_layernorm.weight", "base_model.model.model.layers.13.self_attn.q_proj.weight", "base_model.model.model.layers.13.self_attn.k_proj.weight", "base_model.model.model.layers.13.self_attn.v_proj.weight", "base_model.model.model.layers.13.self_attn.o_proj.weight", "base_model.model.model.layers.13.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.13.mlp.gate_proj.weight", "base_model.model.model.layers.13.mlp.up_proj.weight", "base_model.model.model.layers.13.mlp.down_proj.weight", "base_model.model.model.layers.13.input_layernorm.weight", "base_model.model.model.layers.13.post_attention_layernorm.weight", "base_model.model.model.layers.14.self_attn.q_proj.weight", "base_model.model.model.layers.14.self_attn.k_proj.weight", "base_model.model.model.layers.14.self_attn.v_proj.weight", "base_model.model.model.layers.14.self_attn.o_proj.weight", "base_model.model.model.layers.14.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.14.mlp.gate_proj.weight", "base_model.model.model.layers.14.mlp.up_proj.weight", "base_model.model.model.layers.14.mlp.down_proj.weight", "base_model.model.model.layers.14.input_layernorm.weight", "base_model.model.model.layers.14.post_attention_layernorm.weight", "base_model.model.model.layers.15.self_attn.q_proj.weight", "base_model.model.model.layers.15.self_attn.k_proj.weight", "base_model.model.model.layers.15.self_attn.v_proj.weight", "base_model.model.model.layers.15.self_attn.o_proj.weight", "base_model.model.model.layers.15.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.15.mlp.gate_proj.weight", "base_model.model.model.layers.15.mlp.up_proj.weight", "base_model.model.model.layers.15.mlp.down_proj.weight", "base_model.model.model.layers.15.input_layernorm.weight", "base_model.model.model.layers.15.post_attention_layernorm.weight", "base_model.model.model.layers.16.self_attn.q_proj.weight", "base_model.model.model.layers.16.self_attn.k_proj.weight", "base_model.model.model.layers.16.self_attn.v_proj.weight", "base_model.model.model.layers.16.self_attn.o_proj.weight", "base_model.model.model.layers.16.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.16.mlp.gate_proj.weight", "base_model.model.model.layers.16.mlp.up_proj.weight", "base_model.model.model.layers.16.mlp.down_proj.weight", "base_model.model.model.layers.16.input_layernorm.weight", "base_model.model.model.layers.16.post_attention_layernorm.weight", "base_model.model.model.layers.17.self_attn.q_proj.weight", "base_model.model.model.layers.17.self_attn.k_proj.weight", "base_model.model.model.layers.17.self_attn.v_proj.weight", "base_model.model.model.layers.17.self_attn.o_proj.weight", "base_model.model.model.layers.17.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.17.mlp.gate_proj.weight", "base_model.model.model.layers.17.mlp.up_proj.weight", "base_model.model.model.layers.17.mlp.down_proj.weight", "base_model.model.model.layers.17.input_layernorm.weight", "base_model.model.model.layers.17.post_attention_layernorm.weight", "base_model.model.model.layers.18.self_attn.q_proj.weight", "base_model.model.model.layers.18.self_attn.k_proj.weight", "base_model.model.model.layers.18.self_attn.v_proj.weight", "base_model.model.model.layers.18.self_attn.o_proj.weight", "base_model.model.model.layers.18.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.18.mlp.gate_proj.weight", "base_model.model.model.layers.18.mlp.up_proj.weight", "base_model.model.model.layers.18.mlp.down_proj.weight", "base_model.model.model.layers.18.input_layernorm.weight", "base_model.model.model.layers.18.post_attention_layernorm.weight", "base_model.model.model.layers.19.self_attn.q_proj.weight", "base_model.model.model.layers.19.self_attn.k_proj.weight", "base_model.model.model.layers.19.self_attn.v_proj.weight", "base_model.model.model.layers.19.self_attn.o_proj.weight", "base_model.model.model.layers.19.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.19.mlp.gate_proj.weight", "base_model.model.model.layers.19.mlp.up_proj.weight", "base_model.model.model.layers.19.mlp.down_proj.weight", "base_model.model.model.layers.19.input_layernorm.weight", "base_model.model.model.layers.19.post_attention_layernorm.weight", "base_model.model.model.layers.20.self_attn.q_proj.weight", "base_model.model.model.layers.20.self_attn.k_proj.weight", "base_model.model.model.layers.20.self_attn.v_proj.weight", "base_model.model.model.layers.20.self_attn.o_proj.weight", "base_model.model.model.layers.20.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.20.mlp.gate_proj.weight", "base_model.model.model.layers.20.mlp.up_proj.weight", "base_model.model.model.layers.20.mlp.down_proj.weight", "base_model.model.model.layers.20.input_layernorm.weight", "base_model.model.model.layers.20.post_attention_layernorm.weight", "base_model.model.model.layers.21.self_attn.q_proj.weight", "base_model.model.model.layers.21.self_attn.k_proj.weight", "base_model.model.model.layers.21.self_attn.v_proj.weight", "base_model.model.model.layers.21.self_attn.o_proj.weight", "base_model.model.model.layers.21.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.21.mlp.gate_proj.weight", "base_model.model.model.layers.21.mlp.up_proj.weight", "base_model.model.model.layers.21.mlp.down_proj.weight", "base_model.model.model.layers.21.input_layernorm.weight", "base_model.model.model.layers.21.post_attention_layernorm.weight", "base_model.model.model.layers.22.self_attn.q_proj.weight", "base_model.model.model.layers.22.self_attn.k_proj.weight", "base_model.model.model.layers.22.self_attn.v_proj.weight", "base_model.model.model.layers.22.self_attn.o_proj.weight", "base_model.model.model.layers.22.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.22.mlp.gate_proj.weight", "base_model.model.model.layers.22.mlp.up_proj.weight", "base_model.model.model.layers.22.mlp.down_proj.weight", "base_model.model.model.layers.22.input_layernorm.weight", "base_model.model.model.layers.22.post_attention_layernorm.weight", "base_model.model.model.layers.23.self_attn.q_proj.weight", "base_model.model.model.layers.23.self_attn.k_proj.weight", "base_model.model.model.layers.23.self_attn.v_proj.weight", "base_model.model.model.layers.23.self_attn.o_proj.weight", "base_model.model.model.layers.23.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.23.mlp.gate_proj.weight", "base_model.model.model.layers.23.mlp.up_proj.weight", "base_model.model.model.layers.23.mlp.down_proj.weight", "base_model.model.model.layers.23.input_layernorm.weight", "base_model.model.model.layers.23.post_attention_layernorm.weight", "base_model.model.model.layers.24.self_attn.q_proj.weight", "base_model.model.model.layers.24.self_attn.k_proj.weight", "base_model.model.model.layers.24.self_attn.v_proj.weight", "base_model.model.model.layers.24.self_attn.o_proj.weight", "base_model.model.model.layers.24.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.24.mlp.gate_proj.weight", "base_model.model.model.layers.24.mlp.up_proj.weight", "base_model.model.model.layers.24.mlp.down_proj.weight", "base_model.model.model.layers.24.input_layernorm.weight", "base_model.model.model.layers.24.post_attention_layernorm.weight", "base_model.model.model.layers.25.self_attn.q_proj.weight", "base_model.model.model.layers.25.self_attn.k_proj.weight", "base_model.model.model.layers.25.self_attn.v_proj.weight", "base_model.model.model.layers.25.self_attn.o_proj.weight", "base_model.model.model.layers.25.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.25.mlp.gate_proj.weight", "base_model.model.model.layers.25.mlp.up_proj.weight", "base_model.model.model.layers.25.mlp.down_proj.weight", "base_model.model.model.layers.25.input_layernorm.weight", "base_model.model.model.layers.25.post_attention_layernorm.weight", "base_model.model.model.layers.26.self_attn.q_proj.weight", "base_model.model.model.layers.26.self_attn.k_proj.weight", "base_model.model.model.layers.26.self_attn.v_proj.weight", "base_model.model.model.layers.26.self_attn.o_proj.weight", "base_model.model.model.layers.26.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.26.mlp.gate_proj.weight", "base_model.model.model.layers.26.mlp.up_proj.weight", "base_model.model.model.layers.26.mlp.down_proj.weight", "base_model.model.model.layers.26.input_layernorm.weight", "base_model.model.model.layers.26.post_attention_layernorm.weight", "base_model.model.model.layers.27.self_attn.q_proj.weight", "base_model.model.model.layers.27.self_attn.k_proj.weight", "base_model.model.model.layers.27.self_attn.v_proj.weight", "base_model.model.model.layers.27.self_attn.o_proj.weight", "base_model.model.model.layers.27.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.27.mlp.gate_proj.weight", "base_model.model.model.layers.27.mlp.up_proj.weight", "base_model.model.model.layers.27.mlp.down_proj.weight", "base_model.model.model.layers.27.input_layernorm.weight", "base_model.model.model.layers.27.post_attention_layernorm.weight", "base_model.model.model.layers.28.self_attn.q_proj.weight", "base_model.model.model.layers.28.self_attn.k_proj.weight", "base_model.model.model.layers.28.self_attn.v_proj.weight", "base_model.model.model.layers.28.self_attn.o_proj.weight", "base_model.model.model.layers.28.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.28.mlp.gate_proj.weight", "base_model.model.model.layers.28.mlp.up_proj.weight", "base_model.model.model.layers.28.mlp.down_proj.weight", "base_model.model.model.layers.28.input_layernorm.weight", "base_model.model.model.layers.28.post_attention_layernorm.weight", "base_model.model.model.layers.29.self_attn.q_proj.weight", "base_model.model.model.layers.29.self_attn.k_proj.weight", "base_model.model.model.layers.29.self_attn.v_proj.weight", "base_model.model.model.layers.29.self_attn.o_proj.weight", "base_model.model.model.layers.29.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.29.mlp.gate_proj.weight", "base_model.model.model.layers.29.mlp.up_proj.weight", "base_model.model.model.layers.29.mlp.down_proj.weight", "base_model.model.model.layers.29.input_layernorm.weight", "base_model.model.model.layers.29.post_attention_layernorm.weight", "base_model.model.model.layers.30.self_attn.q_proj.weight", "base_model.model.model.layers.30.self_attn.k_proj.weight", "base_model.model.model.layers.30.self_attn.v_proj.weight", "base_model.model.model.layers.30.self_attn.o_proj.weight", "base_model.model.model.layers.30.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.30.mlp.gate_proj.weight", "base_model.model.model.layers.30.mlp.up_proj.weight", "base_model.model.model.layers.30.mlp.down_proj.weight", "base_model.model.model.layers.30.input_layernorm.weight", "base_model.model.model.layers.30.post_attention_layernorm.weight", "base_model.model.model.layers.31.self_attn.q_proj.weight", "base_model.model.model.layers.31.self_attn.k_proj.weight", "base_model.model.model.layers.31.self_attn.v_proj.weight", "base_model.model.model.layers.31.self_attn.o_proj.weight", "base_model.model.model.layers.31.self_attn.rotary_emb.inv_freq", "base_model.model.model.layers.31.mlp.gate_proj.weight", "base_model.model.model.layers.31.mlp.up_proj.weight", "base_model.model.model.layers.31.mlp.down_proj.weight", "base_model.model.model.layers.31.input_layernorm.weight", "base_model.model.model.layers.31.post_attention_layernorm.weight", "base_model.model.model.norm.weight". ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4986) of binary: /usr/local/miniconda3/bin/python Traceback (most recent call last): File "/usr/local/miniconda3/bin/torchrun", line 8, in <module> sys.exit(main()) File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ run_clm_pt_with_peft.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-08-21_19:49:02 host : I1496abdf480080174e rank : 0 (local_rank: 0) exitcode : 1 (pid: 4986) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
可以参考 issue
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
提交前必须检查以下项目
问题类型
模型训练与精调
基础模型
Alpaca-2-7B
操作系统
Linux
详细描述问题
依赖情况(代码类问题务必提供)
运行日志或截图