Open WoooWZY opened 11 months ago
可以试试将 init 里的numpy计算转成Tensor
请检查MindSpore版本,2.1之前不支持在910B上运行,另外910B的支持在逐步添加中,目前需要添加一些格外的环境变量:
MindSpore在910B上图模式执行需要应用GE的后端编译:
export MS_ENABLE_GE=1
当执行训练时需要添加:
export MS_GE_TRAIN=1
单独做推理时不需要设这个环境变量。
当原脚本中使用了checkpoint保存时,最好开启:
export MS_ENABLE_REF_MODE=1
使用PYNATIVE时,需要设置:
export MS_ENABLE_REF_MODE=1
export MS_DEV_FORCE_ACL=1
且不能设置GE的环境变量:
unset MS_ENABLE_GE
unset MS_GE_TRAIN
[ERROR] DEVICE(246,ffff807c8ac0,python):2023-08-10-12:33:06.836.235 [mindspore/ccsrc/runtime/device/kernel_runtime_manager.cc:136] WaitTaskFinishOnDevice] SyncStream failed, exception:The pointer[stream] is null.
C++ Call Stack: (For framework developers)
mindspore/ccsrc/runtime/device/kernel_runtime.cc:108 LockRuntime
Traceback (most recent call last): File "train.py", line 290, in
train(args)
File "train.py", line 116, in train
sync_bn=args.sync_bn,
File "/home/ma-user/modelarts/user-job-dir/mindyolo/mindyolo/models/model_factory.py", line 30, in create_model
model = create_fn(model_args, kwargs)
File "/home/ma-user/modelarts/user-job-dir/mindyolo/mindyolo/models/yolov7.py", line 54, in yolov7
model = YOLOv7(cfg=cfg, in_channels=in_channels, num_classes=num_classes, *kwargs)
File "/home/ma-user/modelarts/user-job-dir/mindyolo/mindyolo/models/yolov7.py", line 33, in init
self.reset_parameter()
File "/home/ma-user/modelarts/user-job-dir/mindyolo/mindyolo/models/yolov7.py", line 45, in reset_parameter
m.initialize_biases()
File "/home/ma-user/modelarts/user-job-dir/mindyolo/mindyolo/models/heads/yolov7_head.py", line 79, in initialize_biases
for mi, s in zip(m.m, m.stride): # from
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/tensor.py", line 382, in getitem
out = tensor_operator_registry.get('getitem')(self, index)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/ops/composite/multitype_ops/_compile_utils.py", line 57, in _tensor_getitem
return _tensor_index_by_integer(self, index)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/ops/composite/multitype_ops/_compile_utils.py", line 407, in _tensor_index_by_integer
return strided_slice(data, begin_strides, end_strides, step_strides, begin_mask, end_mask, 0, 0, shrink_axis_mask)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/ops/composite/multitype_ops/_compile_utils.py", line 43, in strided_slice
return stridedslice(data, begin_strides, end_strides, step_strides)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 294, in call
return _run_op(self, self.name, args)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 98, in wrapper
results = fn(arg, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 730, in _run_op
output = real_run_op(obj, op_name, args)
RuntimeError: Ascend kernel runtime initialization failed.
Ascend Error Message:
EL9999: Inner Error! EL9999 [drv api] halSqCqAllocate failed: deviceId=0, drvRetCode=17![FUNC:NormalSqCqAllocate][FILE:npu_driver.cc][LINE:555] [SqCqManage]Alloc sq cq fail, stream_id=1, retCode=0x7020022.[FUNC:AllocStreamSqCq][FILE:stream.cc][LINE:61] [SqCqManage]Alloc sq cq fail, stream_id=1, retCode=0x7020022.[FUNC:Setup][FILE:stream.cc][LINE:676] rtStreamCreateWithFlags execute failed, reason=[driver error:internal error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:49] Solution: Please contact support engineer.
Framework Error Message: (For framework developers)
Create stream failed, ret:507899
C++ Call Stack: (For framework developers)
mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:425 Init mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_stream_manager.cc:96 CreateStreamWithFlags