please tell us what kind of hardware can reproduce your error?
请告诉我们您报错的后端类型
[x] Ascend
Software Environment | 软件环境
MindSpore version:
请告诉我们您正在使用的MindSpore版本:
[x] 2.2.11
Python version(e.g., 3.7.5): 3.7.5
OS(e.g., Linux Ubuntu 16.04) Ubuntu 18.04
GCC/Compiler version: 7.5.0
Describe the current behavior | 目前输出
参考lora微调指南,单卡可以正常训练。
改用多卡,会报错误。
Finish preparing normal sample in 1 attempt(s)
Dataloader num parallel workers: [16]
scheduler_config not exist, train with base_lr 0.0001 and lr_scaler 1.0
[-1, -1, -1, -1, -1, 31, 63, 63, 223, 383, 543, 703, 863, 1023, 1055, 1087, 1119, 1119, 1119, 1119, 1119]
Traceback (most recent call last):
File "/data/sdtest/mindone/examples/stable_diffusion_xl/train.py", line 693, in <module>
train(args)
File "/data/sdtest/mindone/examples/stable_diffusion_xl/train.py", line 239, in train
ms.set_auto_parallel_context(all_reduce_fusion_config=all_reduce_fusion_config)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/_checkparam.py", line 1313, in wrapper
return func(*args, **kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/context.py", line 876, in set_auto_parallel_context
_set_auto_parallel_context(**kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/_checkparam.py", line 1313, in wrapper
return func(*args, **kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/parallel/_auto_parallel_context.py", line 1275, in _set_auto_parallel_context
set_func(value)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/parallel/_auto_parallel_context.py", line 626, in set_all_reduce_fusion_split_indices
raise ValueError("The indices has duplicate elements")
ValueError: The indices has duplicate elements
Hardware Environment | 硬件环境
Ascend
Software Environment | 软件环境
Describe the current behavior | 目前输出
参考lora微调指南,单卡可以正常训练。
改用多卡,会报错误。
其中:
[-1, -1, -1, -1, -1, 31, 63, 63, 223, 383, 543, 703, 863, 1023, 1055, 1087, 1119, 1119, 1119, 1119, 1119]
是print(all_reduce_fusion_config)
的输出。Describe the expected behavior | 期望输出
please describe expected outputs or functions you want to have: 请告诉我们您期望得到的结果或功能: lora微调支持多机多卡