VeitchG commented 1 week ago

求助！！！，微调的时候遇到NotImplementedError: data_seed requires Accelerate version accelerate >= 1.1.0. This is not supported and we recommend you to update your version，但是accelerate当前最新的就1.0.1

VeitchG commented 1 week ago

运行脚本过程中有如下错误，亲各位大佬帮忙看看，run sh: /root/miniconda3/envs/Qwen2-VL/bin/python -m torch.distributed.run --nproc_per_node 2 /root/Qwen2-VL/ms-swift/swift/cli/sft.py --model_type qwen2-vl-7b-instruct --model_id_or_path qwen/Qwen2-VL-7B-Instruct --sft_type lora --dataset test1.jsonl#20000 --deepspeed default-zero3 WARNING:main:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

[INFO:swift] Successfully registered /root/Qwen2-VL/ms-swift/swift/llm/data/dataset_info.json /root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import version as VLLM_VERSION /root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import version as VLLM_VERSION [ERROR:swift] import vllm_utils error: Invalid version: 'dev' [INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm' [ERROR:swift] import vllm_utils error: Invalid version: 'dev' [INFO:swift] Start time of running main: 2024-10-17 19:17:59.587550 [INFO:swift] Setting template_type: qwen2-vl [INFO:swift] Using deepspeed: {'fp16': {'enabled': 'auto', 'loss_scale': 0, 'loss_scale_window': 1000, 'initial_scale_power': 16, 'hysteresis': 2, 'min_loss_scale': 1}, 'bf16': {'enabled': 'auto'}, 'optimizer': {'type': 'AdamW', 'params': {'lr': 'auto', 'betas': 'auto', 'eps': 'auto', 'weight_decay': 'auto'}}, 'scheduler': {'type': 'WarmupCosineLR', 'params': {'total_num_steps': 'auto', 'warmup_num_steps': 'auto'}}, 'zero_optimization': {'stage': 3, 'offload_optimizer': {'device': 'none', 'pin_memory': True}, 'offload_param': {'device': 'none', 'pin_memory': True}, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': 2000, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False} [INFO:swift] Setting args.lazy_tokenize: True [INFO:swift] Setting args.dataloader_num_workers: 1 [2024-10-17 19:17:59,804] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-10-17 19:17:59,935] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-10-17 19:18:00,460] [INFO] [comm.py:652:init_distributed] cdb=None [2024-10-17 19:18:00,460] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl rank0: Traceback (most recent call last): rank0: File "/root/Qwen2-VL/ms-swift/swift/cli/sft.py", line 5, in

rank0: File "/root/Qwen2-VL/ms-swift/swift/utils/run_utils.py", line 22, in x_main rank0: args, remaining_argv = parse_args(args_class, argv)

rank0: File "/root/Qwen2-VL/ms-swift/swift/utils/utils.py", line 131, in parse_args rank0: args, remaining_args = parser.parse_args_into_dataclasses(argv, return_remaining_strings=True)

rank0: File "/root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/transformers/hf_argparser.py", line 352, in parse_args_into_dataclasses rank0: obj = dtype(**inputs)

rank0: File "", line 215, in init rank0: File "/root/Qwen2-VL/ms-swift/swift/llm/utils/argument.py", line 1151, in __post_init__

rank0: File "/root/Qwen2-VL/ms-swift/swift/llm/utils/argument.py", line 1203, in _init_training_args rank0: training_args = training_args_cls(

rank0: File "", line 144, in init rank0: File "/root/Qwen2-VL/ms-swift/swift/trainers/arguments.py", line 39, in __post_init__

rank0: File "/root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/transformers/training_args.py", line 2083, in __post_init__ rank0: raise NotImplementedError( rank0: NotImplementedError: data_seed requires Accelerate version accelerate >= 1.1.0. This is not supported and we recommend you to update your version. [2024-10-17 19:18:00,609] [INFO] [comm.py:652:init_distributed] cdb=None rank1: Traceback (most recent call last): rank1: File "/root/Qwen2-VL/ms-swift/swift/cli/sft.py", line 5, in

rank1: File "/root/Qwen2-VL/ms-swift/swift/utils/run_utils.py", line 22, in x_main rank1: args, remaining_argv = parse_args(args_class, argv)

rank1: File "/root/Qwen2-VL/ms-swift/swift/utils/utils.py", line 131, in parse_args rank1: args, remaining_args = parser.parse_args_into_dataclasses(argv, return_remaining_strings=True)

rank1: File "/root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/transformers/hf_argparser.py", line 352, in parse_args_into_dataclasses rank1: obj = dtype(**inputs)

rank1: File "", line 215, in init rank1: File "/root/Qwen2-VL/ms-swift/swift/llm/utils/argument.py", line 1151, in __post_init__

rank1: File "/root/Qwen2-VL/ms-swift/swift/llm/utils/argument.py", line 1203, in _init_training_args rank1: training_args = training_args_cls(

rank1: File "", line 144, in init rank1: File "/root/Qwen2-VL/ms-swift/swift/trainers/arguments.py", line 39, in __post_init__

api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/root/Qwen2-VL/ms-swift/swift/cli/sft.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-10-17_19:18:01 host : instance-mmufurso rank : 0 (local_rank: 0) exitcode : 1 (pid: 43832) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Echo0125 commented 1 week ago

你的transformers版本太高了，4.52.2就ok

lemonzjk commented 5 days ago

没有4.52.2啊也，最大4.46.0

Jintao-Huang commented 5 days ago

https://github.com/modelscope/ms-swift/issues/2339

modelscope / ms-swift

求助！！！，微调的时候遇到NotImplementedError: data_seed requires Accelerate version `accelerate` >= 1.1.0. This is not supported and we recommend you to update your version #2277

/root/Qwen2-VL/ms-swift/swift/cli/sft.py FAILED