Closed VeitchG closed 4 days ago
运行脚本过程中有如下错误,亲各位大佬帮忙看看,run sh: /root/miniconda3/envs/Qwen2-VL/bin/python -m torch.distributed.run --nproc_per_node 2 /root/Qwen2-VL/ms-swift/swift/cli/sft.py --model_type qwen2-vl-7b-instruct --model_id_or_path qwen/Qwen2-VL-7B-Instruct --sft_type lora --dataset test1.jsonl#20000 --deepspeed default-zero3
WARNING:main:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[INFO:swift] Successfully registered /root/Qwen2-VL/ms-swift/swift/llm/data/dataset_info.json
/root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
from vllm.version import version as VLLM_VERSION
/root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
from vllm.version import version as VLLM_VERSION
[ERROR:swift] import vllm_utils error: Invalid version: 'dev'
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'
[ERROR:swift] import vllm_utils error: Invalid version: 'dev'
[INFO:swift] Start time of running main: 2024-10-17 19:17:59.587550
[INFO:swift] Setting template_type: qwen2-vl
[INFO:swift] Using deepspeed: {'fp16': {'enabled': 'auto', 'loss_scale': 0, 'loss_scale_window': 1000, 'initial_scale_power': 16, 'hysteresis': 2, 'min_loss_scale': 1}, 'bf16': {'enabled': 'auto'}, 'optimizer': {'type': 'AdamW', 'params': {'lr': 'auto', 'betas': 'auto', 'eps': 'auto', 'weight_decay': 'auto'}}, 'scheduler': {'type': 'WarmupCosineLR', 'params': {'total_num_steps': 'auto', 'warmup_num_steps': 'auto'}}, 'zero_optimization': {'stage': 3, 'offload_optimizer': {'device': 'none', 'pin_memory': True}, 'offload_param': {'device': 'none', 'pin_memory': True}, 'overlap_comm': True, 'contiguous_gradients': True, 'sub_group_size': 1000000000.0, 'reduce_bucket_size': 'auto', 'stage3_prefetch_bucket_size': 'auto', 'stage3_param_persistence_threshold': 'auto', 'stage3_max_live_parameters': 1000000000.0, 'stage3_max_reuse_distance': 1000000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': 2000, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False}
[INFO:swift] Setting args.lazy_tokenize: True
[INFO:swift] Setting args.dataloader_num_workers: 1
[2024-10-17 19:17:59,804] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-10-17 19:17:59,935] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-10-17 19:18:00,460] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-10-17 19:18:00,460] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
rank0: Traceback (most recent call last):
rank0: File "/root/Qwen2-VL/ms-swift/swift/cli/sft.py", line 5, in
rank0: File "/root/Qwen2-VL/ms-swift/swift/utils/run_utils.py", line 22, in x_main rank0: args, remaining_argv = parse_args(args_class, argv)
rank0: File "/root/Qwen2-VL/ms-swift/swift/utils/utils.py", line 131, in parse_args rank0: args, remaining_args = parser.parse_args_into_dataclasses(argv, return_remaining_strings=True)
rank0: File "/root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/transformers/hf_argparser.py", line 352, in parse_args_into_dataclasses rank0: obj = dtype(**inputs)
rank0: File "
rank0: File "/root/Qwen2-VL/ms-swift/swift/llm/utils/argument.py", line 1203, in _init_training_args rank0: training_args = training_args_cls(
rank0: File "
rank0: File "/root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/transformers/training_args.py", line 2083, in __post_init__
rank0: raise NotImplementedError(
rank0: NotImplementedError: data_seed requires Accelerate version accelerate
>= 1.1.0. This is not supported and we recommend you to update your version.
[2024-10-17 19:18:00,609] [INFO] [comm.py:652:init_distributed] cdb=None
rank1: Traceback (most recent call last):
rank1: File "/root/Qwen2-VL/ms-swift/swift/cli/sft.py", line 5, in
rank1: File "/root/Qwen2-VL/ms-swift/swift/utils/run_utils.py", line 22, in x_main rank1: args, remaining_argv = parse_args(args_class, argv)
rank1: File "/root/Qwen2-VL/ms-swift/swift/utils/utils.py", line 131, in parse_args rank1: args, remaining_args = parser.parse_args_into_dataclasses(argv, return_remaining_strings=True)
rank1: File "/root/miniconda3/envs/Qwen2-VL/lib/python3.12/site-packages/transformers/hf_argparser.py", line 352, in parse_args_into_dataclasses rank1: obj = dtype(**inputs)
rank1: File "
rank1: File "/root/Qwen2-VL/ms-swift/swift/llm/utils/argument.py", line 1203, in _init_training_args rank1: training_args = training_args_cls(
rank1: File "
accelerate
>= 1.1.0. This is not supported and we recommend you to update your version.
W1017 19:18:01.449000 140032046114624 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 43833 closing signal SIGTERM
E1017 19:18:01.563000 140032046114624 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 43832) of binary: /root/miniconda3/envs/Qwen2-VL/bin/python
Traceback (most recent call last):
File "Failures:
你的transformers版本太高了,4.52.2就ok
没有4.52.2啊也,最大4.46.0
求助!!!,微调的时候遇到NotImplementedError: data_seed requires Accelerate version
accelerate
>= 1.1.0. This is not supported and we recommend you to update your version,但是accelerate当前最新的就1.0.1