modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.47k stars 298 forks source link

微调, deepspeed出现报错 #1570

Closed badmic closed 3 weeks ago

badmic commented 1 month ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) 微调脚本:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ NPROC_PER_NODE=8 \ swift sft \ --custom_register_path /data/project/swift/examples/pytorch/llm/scripts/customs.py \ --model_type model_base \ --model_id_or_path /data/train/model-base \ --sft_type full \ --tuner_backend peft \ --template_type AUTO \ --dtype AUTO \ --output_dir output \ --ddp_backend nccl \ --dataset /data/project/data/QA-chinese/process.jsonl\ --num_train_epochs 2 \ --max_length 2048 \ --check_dataset_strategy warning \ --gradient_checkpointing true \ --batch_size 16 \ --weight_decay 0.1 \ --learning_rate 1e-4 \ --gradient_accumulation_steps 16 \ --max_grad_norm 0.5 \ --warmup_ratio 0.03 \ --eval_steps 1000 \ --save_steps 1000 \ --save_total_limit 3 \ --gradient_accumulation_steps 4 \ --use_flash_attn true \ --logging_steps 10 \ --deepspeed default-zero2

报错日志:

rank6: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/omegaconf.py", line 214, in _create_impl rank6: return DictConfig( rank6: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 74, in init

rank6: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 549, in _set_value rank6: data = get_structured_config_data(value) rank6: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/_utils.py", line 233, in get_structured_config_data rank6: return get_dataclass_data(obj) rank6: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/_utils.py", line 176, in get_dataclass_data rank6: d[name] = _maybe_wrap( rank6: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/omegaconf.py", line 677, in _maybe_wrap rank6: return _node_wrap( rank6: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/omegaconf.py", line 642, in _nodewrap rank6: elif issubclass(type, Enum): rank6: TypeError: issubclass() arg 1 must be a class

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

Additional context Add any other context about the problem here(在这里补充其他信息)

Jintao-Huang commented 1 month ago

没见过诶 应该是环境问题

Jintao-Huang commented 1 month ago

可以报错信息再往上一点吗 看看是哪里抛出来的

badmic commented 1 month ago

@Jintao-Huang 100%|████████████████████████████████████████████████████████████████████████████| 9900/9900 [00:02<00:00, 3330.27it/s] 100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3429.60it/s] 100%|████████████████████████████████████████████████████████████████████████████| 9900/9900 [00:03<00:00, 3263.10it/s] 100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3359.93it/s] 100%|████████████████████████████████████████████████████████████████████████████| 9900/9900 [00:03<00:00, 3219.16it/s] 100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3388.19it/s] [INFO:swift] The SftArguments will be saved in: /data/project/ys/swift/output/DZJ6B_base/v7-20240801-201507/sft_args.json [INFO:swift] The Seq2SeqTrainingArguments will be saved in: /data/project/ys/swift/output/DZJ6B_base/v7-20240801-201507/training_args.json [INFO:swift] The logging file will be saved in: /data/project/ys/swift/output/DZJ6B_base/v7-20240801-201507/logging.jsonl rank3: Traceback (most recent call last): rank3: File "/data/project/ys/swift/swift/cli/sft.py", line 5, in

rank3: File "/data/project/ys/swift/swift/utils/run_utils.py", line 27, in x_main rank3: result = llm_x(args, **kwargs) rank3: File "/data/project/ys/swift/swift/llm/sft.py", line 384, in llm_sft

rank3: File "/data/project/ys/swift/swift/trainers/mixin.py", line 522, in train rank3: res = super().train(resume_from_checkpoint, *args, *kwargs) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train rank3: return inner_training_loop( rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/transformers/trainer.py", line 2098, in _inner_training_loop rank3: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/accelerate/accelerator.py", line 1303, in prepare rank3: result = self._prepare_deepspeed(args) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/accelerate/accelerator.py", line 1779, in _preparedeepspeed rank3: engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/deepspeed/init.py", line 171, in initialize rank3: engine = DeepSpeedEngine(args=args, rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 237, in init

rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1017, in _do_sanity_check rank3: expected_optim_types = self._supported_optims() rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1005, in _supported_optims rank3: from fairseq.optim.fairseq_optimizer import FairseqOptimizer rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/init.py", line 33, in rank3: import fairseq.criterions # noqa rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/criterions/init.py", line 36, in rank3: importlib.import_module("fairseq.criterions." + file_name) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/importlib/init.py", line 126, in import_module rank3: return _bootstrap._gcd_import(name[level:], package, level) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/criterions/ctc.py", line 19, in rank3: from fairseq.tasks import FairseqTask rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/tasks/init.py", line 15, in rank3: from .fairseq_task import FairseqTask, LegacyFairseqTask # noqa rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/tasks/fairseq_task.py", line 17, in rank3: from fairseq.optim.amp_optimizer import AMPOptimizer rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/optim/init.py", line 48, in rank3: importlib.import_module("fairseq.optim." + file_name) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/importlib/init.py", line 126, in import_module rank3: return _bootstrap._gcd_import(name[level:], package, level) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/optim/composite.py", line 14, in rank3: from fairseq.optim.lr_scheduler import FairseqLRScheduler, build_lr_scheduler rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/optim/lr_scheduler/init.py", line 36, in rank3: importlib.import_module("fairseq.optim.lr_scheduler." + file_name) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/importlib/init.py", line 126, in import_module rank3: return _bootstrap._gcd_import(name[level:], package, level) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/optim/lr_scheduler/tri_stage_lr_scheduler.py", line 51, in rank3: class TriStageLRSchedule(FairseqLRScheduler): rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/fairseq/registry.py", line 92, in register_x_cls rank3: cs.store(name=name, group=registry_name, node=node, provider="fairseq") rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/hydra/core/config_store.py", line 85, in store rank3: cfg = OmegaConf.structured(node) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/omegaconf.py", line 133, in structured rank3: return OmegaConf.create(obj, parent) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/omegaconf.py", line 170, in create rank3: return OmegaConf._create_impl(obj=obj, parent=parent) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/omegaconf.py", line 214, in _create_impl rank3: return DictConfig( rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 74, in init

rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/dictconfig.py", line 549, in _set_value rank3: data = get_structured_config_data(value) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/_utils.py", line 233, in get_structured_config_data rank3: return get_dataclass_data(obj) rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/_utils.py", line 176, in get_dataclass_data rank3: d[name] = _maybe_wrap( rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/omegaconf.py", line 677, in _maybe_wrap rank3: return _node_wrap( rank3: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/omegaconf/omegaconf.py", line 642, in _nodewrap rank3: elif issubclass(type, Enum): rank3: TypeError: issubclass() arg 1 must be a class rank6: Traceback (most recent call last): rank6: File "/data/project/ys/swift/swift/cli/sft.py", line 5, in

rank6: File "/data/project/ys/swift/swift/utils/run_utils.py", line 27, in x_main rank6: result = llm_x(args, **kwargs) rank6: File "/data/project/ys/swift/swift/llm/sft.py", line 384, in llm_sft

rank6: File "/data/project/ys/swift/swift/trainers/mixin.py", line 522, in train rank6: res = super().train(resume_from_checkpoint, *args, **kwargs) rank6: File "/root/miniconda3/envs/swift_cu/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train

badmic commented 1 month ago

在处理数据集的时候就出现了这个报错

badmic commented 1 month ago

环境信息:


absl-py 2.0.0 accelerate 0.33.0 addict 2.4.0 aiofiles 23.2.1 aiohttp 3.9.5 aioprometheus 23.3.0 aiosignal 1.3.1 aliyun-python-sdk-core 2.15.0 aliyun-python-sdk-kms 2.16.2 altair 5.2.0 annotated-types 0.6.0 antlr4-python3-runtime 4.8 anyio 4.2.0 appdirs 1.4.4 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 arxiv 2.1.0 asttokens 2.4.1 astunparse 1.6.3 async-lru 2.0.4 async-timeout 4.0.3 attrdict 2.0.1 attrs 23.1.0 auto_gptq 0.7.1 autoawq 0.2.6 autoawq_kernels 0.0.7 Babel 2.14.0 backoff 2.2.1 backports.strenum 1.3.1 beautifulsoup4 4.12.2 binpacking 1.5.2 bitarray 2.9.2 bitblas 0.0.1.dev13 bitsandbytes 0.41.3.post2 bleach 6.1.0 blessed 1.20.0 blinker 1.8.2 boto3 1.34.34 botocore 1.34.34 cachetools 5.3.2 certifi 2022.12.7 cffi 1.16.0 chardet 5.2.0 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 cmake 3.30.1 cn2an 0.5.22 colorama 0.4.6 coloredlogs 15.0.1 comm 0.2.0 contourpy 1.2.0 cpm-kernels 1.0.11 cpplint 1.6.1 crcmod 1.7 croniter 1.4.1 cryptography 42.0.5 cycler 0.12.1 Cython 3.0.10 dacite 1.8.1 DataProperty 1.0.1 datasets 2.18.0 dateutils 0.6.12 debugpy 1.8.0 decorator 5.1.1 decord 0.6.0 deepdiff 6.7.1 deepspeed 0.12.5 defusedxml 0.7.1 diffusers 0.25.0 dill 0.3.7 diskcache 5.6.3 distro 1.9.0 dnspython 2.6.1 docker-pycreds 0.4.0 docstring-parser 0.15 docutils 0.21.2 dropout-layer-norm 0.1 dtlib 0.0.0.dev2 editdistance 0.8.1 editor 1.6.6 einops 0.5.0 email_validator 2.2.0 et-xmlfile 1.1.0 evaluate 0.4.1 exceptiongroup 1.2.0 execnet 2.1.1 executing 2.0.1 fairscale 0.4.13 fairseq 0.12.2 fastapi 0.111.1 fastapi-cli 0.0.4 fastjsonschema 2.19.0 feedparser 6.0.10 ffmpy 0.3.1 filelock 3.15.4 fire 0.5.0 flash-attn 2.6.3 fonttools 4.47.2 fqdn 1.5.1 frozenlist 1.4.1 fsspec 2023.10.0 func_timeout 4.3.5 future 1.0.0 fuzzywuzzy 0.18.0 gekko 1.2.1 gitdb 4.0.11 GitPython 3.1.40 google-auth 2.25.2 google-auth-oauthlib 1.2.0 gradio 4.39.0 gradio_client 1.1.1 griffe 0.48.0 grpcio 1.60.0 h11 0.14.0 hf_transfer 0.1.6 hjson 3.1.0 hqq 0.1.8 httpcore 1.0.2 httptools 0.6.1 httpx 0.26.0 huggingface-hub 0.23.5 humanfriendly 10.0 hydra-core 1.0.7 idna 3.4 imageio 2.34.2 immutabledict 4.2.0 importlib_metadata 8.2.0 importlib-resources 6.1.0 iniconfig 2.0.0 inquirer 3.2.3 interegular 0.3.3 ipdb 0.13.13 ipykernel 6.27.1 ipython 8.19.0 ipywidgets 8.1.1 isoduration 20.11.0 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.2 jmespath 0.10.0 joblib 1.3.2 json5 0.9.14 jsonargparse 4.27.4 jsonlines 4.0.0 jsonpointer 2.4 jsonschema 4.20.0 jsonschema-specifications 2023.11.2 jupyter 1.0.0 jupyter_client 8.6.0 jupyter-console 6.6.3 jupyter_core 5.5.1 jupyter-events 0.9.0 jupyter-lsp 2.2.1 jupyter_server 2.12.1 jupyter_server_terminals 0.5.0 jupyterlab 4.0.9 jupyterlab_pygments 0.3.0 jupyterlab_server 2.25.2 jupyterlab-widgets 3.0.9 kiwisolver 1.4.5 lagent 0.2.2 langdetect 1.0.9 lark 1.1.9 lazy_loader 0.4 Levenshtein 0.25.1 lightning 2.2.0.post0 lightning-cloud 0.5.64 lightning-utilities 0.10.1 llmuses 0.4.1 llvmlite 0.43.0 lm-eval 0.3.0 lm-format-enforcer 0.10.1 loguru 0.7.2 ltp 4.2.13 ltp-core 0.1.4 ltp-extension 0.1.13 lxml 5.2.1 Markdown 3.5.1 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.9.1 matplotlib-inline 0.1.6 mbstrdecoder 1.1.3 mdurl 0.1.2 mistune 3.0.2 ml-dtypes 0.4.0 mmengine 0.10.4 mmengine-lite 0.10.4 modelscope 1.16.1 more-itertools 10.2.0 mpmath 1.3.0 ms-opencompass 0.0.1 ms-swift 2.3.0.dev0 /data/project/ys/swift msgpack 1.0.7 multidict 6.0.4 multipledispatch 1.0.0 multiprocess 0.70.15 nbclient 0.9.0 nbconvert 7.13.1 nbformat 5.9.2 nest-asyncio 1.5.8 networkx 3.0 ninja 1.11.1 nltk 3.8 notebook 7.0.6 notebook_shim 0.2.3 numba 0.60.0 numexpr 2.10.0 numpy 1.26.4 nvidia-cublas-cu11 11.11.3.6 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu11 11.8.87 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvcc-cu11 11.8.89 nvidia-cuda-nvrtc-cu11 11.8.89 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu11 11.8.89 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu11 8.9.6.50 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu11 10.9.0.58 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu11 10.3.0.86 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu11 11.4.1.48 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu11 11.7.5.86 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.555.43 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.2 omegaconf 2.0.0 openai 1.37.1 OpenCC 1.1.7 opencv-python 4.10.0.84 opencv-python-headless 4.9.0.80 openpyxl 3.1.5 ordered-set 4.1.0 orjson 3.9.10 oss2 2.18.6 outlines 0.0.46 overrides 7.4.0 packaging 23.2 pandas 1.5.3 pandocfilters 1.5.0 parso 0.8.3 pathvalidate 3.2.0 peft 0.11.1 pexpect 4.9.0 phx-class-registry 4.1.0 Pillow 9.3.0 pip 24.1.2 platformdirs 4.1.0 plotly 5.23.0 pluggy 1.5.0 ply 3.11 portalocker 2.8.2 prettytable 3.10.0 proces 0.1.7 prometheus-client 0.19.0 prometheus-fastapi-instrumentator 7.0.0 prompt-toolkit 3.0.43 protobuf 4.23.4 psutil 5.9.7 ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 9.0.0 pyairports 2.1.1 pyarrow 14.0.2 pyarrow-hotfix 0.6 pyasn1 0.5.1 pyasn1-modules 0.3.0 pybind11 2.12.0 pycountry 23.12.11 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.6.1 pydantic_core 2.16.2 pydeck 0.9.1 pydub 0.25.1 pyext 0.7 Pygments 2.17.2 PyJWT 2.8.0 Pympler 1.1 pynvml 11.5.0 pyparsing 3.1.1 pypinyin 0.51.0 pytablewriter 1.2.0 pytest 8.3.2 pytest-xdist 3.6.1 python-dateutil 2.8.2 python-dotenv 1.0.0 python-json-logger 2.0.7 python-Levenshtein 0.25.1 python-multipart 0.0.9 pytorch-lightning 2.1.4 pytz 2023.3.post1 PyYAML 6.0.1 pyzmq 25.1.2 qtconsole 5.5.1 QtPy 2.4.1 quantile-python 1.1 rank-bm25 0.2.2 rapidfuzz 3.9.0 ray 2.33.0 readchar 4.0.5 referencing 0.32.0 regex 2023.10.3 requests 2.31.0 requests-oauthlib 1.3.1 requests-toolbelt 1.0.0 responses 0.18.0 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.4.2 rotary-emb 0.1 rouge 1.0.1 rouge-chinese 1.0.3 /root/miniconda3/envs/swift_cu/lib/python3.10/site-packages rouge-score 0.1.2 rpds-py 0.15.2 rsa 4.9 ruff 0.5.5 runs 1.2.2 s3transfer 0.10.0 sacrebleu 1.5.0 safetensors 0.4.3 scikit-image 0.24.0 scikit-learn 1.2.1 scipy 1.11.4 seaborn 0.13.2 semantic-version 2.10.0 Send2Trash 1.8.2 sentence-transformers 2.2.2 sentencepiece 0.2.0 sentry-sdk 1.39.1 setproctitle 1.3.3 setuptools 69.5.1 sgmllib3k 1.0.0 shellingham 1.5.4 shtab 1.6.5 simple-ddl-parser 1.5.1 simplejson 3.19.2 six 1.16.0 smart-open 7.0.3 smmap 5.0.1 sniffio 1.3.0 sortedcontainers 2.4.0 soupsieve 2.5 spaces 0.22.0 sqlitedict 2.1.0 stack-data 0.6.3 stanford-stk 0.0.6 starlette 0.37.2 streamlit 1.37.0 sympy 1.12 tabledata 1.3.3 tabulate 0.9.0 tcolorpy 0.1.4 tempdir 0.7.1 tenacity 8.5.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 termcolor 2.4.0 terminado 0.18.0 thefuzz 0.22.1 threadpoolctl 3.4.0 tifffile 2024.7.24 tiktoken 0.7.0 timeout-decorator 0.5.0 tinycss2 1.2.1 tokenizers 0.19.1 toml 0.10.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.3.0 torchaudio 2.1.2+cu118 torchmetrics 1.3.0.post0 torchvision 0.18.0 tornado 6.4 tqdm 4.64.1 tqdm-multiprocess 0.0.11 traitlets 5.14.0 transformers 4.43.0 transformers-stream-generator 0.0.5 triton 2.3.0 trl 0.9.6 typepy 1.3.2 typer 0.12.3 types-python-dateutil 2.8.19.14 typeshed-client 2.4.0 typing_extensions 4.12.2 tyro 0.6.0 tzdata 2023.3 uri-template 1.3.0 urllib3 2.0.7 uvicorn 0.30.3 uvloop 0.19.0 vllm 0.5.1 vllm-flash-attn 2.5.9 wandb 0.16.1 watchdog 4.0.1 watchfiles 0.21.0 wcwidth 0.2.12 webcolors 1.13 webencodings 0.5.1 websocket-client 1.7.0 websockets 11.0.3 Werkzeug 3.0.1 wheel 0.41.2 widgetsnbextension 4.0.9 wikiextractor 3.0.6 word2number 1.1 wrapt 1.16.0 xentropy-cuda-lib 0.1 xformers 0.0.26.post1 xmod 1.8.1 xtuner 0.1.23 xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4 zipp 3.18.1 zstandard 0.22.0

Jintao-Huang commented 1 month ago

pip install deepspeed -U

badmic commented 1 month ago

@Jintao-Huang pip install deepspeed -U不是太行,依旧是同样的报错

badmic commented 1 month ago

@Jintao-Huang 我排查出来,是 --deepspeed default-zero2 这一个脚本参数的问题了 但是怎么解决我还是不太知道