Closed lonngxiang closed 1 month ago
这个是什么环境 还有上面的报错也可以看看
这个是什么环境 还有上面的报错也可以看看
云环境
mpirun noticed that process rank 0 with PID 0 on node notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60 exited on signal 11 (Segmentation fault).
可以先跑一下单卡看看代码本身有没有问题,没问题的话可能是环境的 openmpi、multiprocessing 包不对
可以先跑一下单卡看看代码本身有没有问题,没问题的话可能是环境的 openmpi、multiprocessing 包不对
嗯单卡跑没问题,包版本是:
multiprocess 0.70.12.2
openmpi这个包没看到,需要单独安装吗;mpirun指令是可以用的 ![Uploading image.png…]()
paddlenlp 2.5.2 requires multiprocess<=0.70.12.2 这个版本是对应的
我把mindspore 从2.0升到2.2,现在运行报这个错误
RuntimeError: Unsupported device target Ascend. This process only supports one of the ['CPU']. Please check whether the Ascend environment is installed and configured correctly, and check whether current mindspore wheel package was built with "-e Ascend". For details, please refer to "Device load error message".
Load dynamic library: libmindspore_ascend.so.2 failed. /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/plugin/libmindspore_ascend.so.2: undefined symbol: _ZTVN2ge11ModelHelperE Load dynamic library: libmindspore_ascend.so.1 failed. /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/plugin/libmindspore_ascend.so.1: undefined symbol: _ZTVN2ge11ModelHelperE
mindspore/core/utils/ms_context.cc:355 SetDeviceTargetFromInner
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[24669,1],
mindspore版本应该要跟notebook运行节点的driver版本一致才能跑,单卡可以跑的话可以不用升级mindspore
paddlenlp 2.5.2 requires multiprocess<=0.70.12.2 这个版本是对应的
为啥要调用到 paddlenlp库?
paddlenlp 2.5.2 requires multiprocess<=0.70.12.2 这个版本是对应的
为啥要调用到 paddlenlp库?
不要意思打错,multiprocess 版本我升级到 0.70.15 还是不行
版本可以参考这个试试 https://github.com/mindspore-lab/mindyolo/blob/master/docs/en/installation.md
嗯,我看了下我本机mpirun (Open MPI) 版本4.0.6,确实有点不一样,我重新安装再看看
暂时还是不行,这是完整报错日志
mpirun --allow-run-as-root -n 2 python train.py --config ./configs/yolov8/yolov8n1.yaml --is_parallel True
[WARNING] DEVICE(3627895,ffffa0f680b0,python):2023-11-15-16:39:19.924.005 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_stream_assign.cc:1762] InsertEventCommonDependHcom] Hcom node:Default/Broadcast-op5, can't find target for insert recv op, no insert send/recv
[WARNING] DEVICE(3627895,ffffa0f680b0,python):2023-11-15-16:39:19.924.081 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_stream_assign.cc:1690] GraphLoopSync] There is no event between computing stream and hcom stream in graph 0 need insert event.
[WARNING] DEVICE(3627893,ffff9829b0b0,python):2023-11-15-16:39:20.437.552 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_stream_assign.cc:1762] InsertEventCommonDependHcom] Hcom node:Default/Broadcast-op5, can't find target for insert recv op, no insert send/recv
[WARNING] DEVICE(3627893,ffff9829b0b0,python):2023-11-15-16:39:20.437.624 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_stream_assign.cc:1690] GraphLoopSync] There is no event between computing stream and hcom stream in graph 0 need insert event.
2023-11-15 16:39:20,995 [INFO] parse_args:
2023-11-15 16:39:20,995 [INFO] task detect
2023-11-15 16:39:20,995 [INFO] device_target Ascend
2023-11-15 16:39:20,995 [INFO] save_dir ./runs/2023.11.15-16.38.57
2023-11-15 16:39:20,995 [INFO] device_per_servers 8
2023-11-15 16:39:20,995 [INFO] log_level INFO
2023-11-15 16:39:20,995 [INFO] is_parallel True
2023-11-15 16:39:20,995 [INFO] ms_mode 0
2023-11-15 16:39:20,995 [INFO] ms_amp_level O0
2023-11-15 16:39:20,995 [INFO] keep_loss_fp32 True
2023-11-15 16:39:20,995 [INFO] ms_loss_scaler static
2023-11-15 16:39:20,995 [INFO] ms_loss_scaler_value 1024.0
2023-11-15 16:39:20,995 [INFO] ms_jit True
2023-11-15 16:39:20,995 [INFO] ms_enable_graph_kernel False
2023-11-15 16:39:20,995 [INFO] ms_datasink False
2023-11-15 16:39:20,995 [INFO] overflow_still_update True
2023-11-15 16:39:20,995 [INFO] clip_grad False
2023-11-15 16:39:20,995 [INFO] clip_grad_value 10.0
2023-11-15 16:39:20,995 [INFO] ema True
2023-11-15 16:39:20,995 [INFO] weight
2023-11-15 16:39:20,995 [INFO] ema_weight
2023-11-15 16:39:20,995 [INFO] freeze []
2023-11-15 16:39:20,995 [INFO] epochs 100
2023-11-15 16:39:20,995 [INFO] per_batch_size 16
2023-11-15 16:39:20,995 [INFO] img_size 640
2023-11-15 16:39:20,995 [INFO] nbs 64
2023-11-15 16:39:20,995 [INFO] accumulate 1
2023-11-15 16:39:20,995 [INFO] auto_accumulate False
2023-11-15 16:39:20,995 [INFO] log_interval 100
2023-11-15 16:39:20,995 [INFO] single_cls False
2023-11-15 16:39:20,995 [INFO] sync_bn True
2023-11-15 16:39:20,995 [INFO] keep_checkpoint_max 100
2023-11-15 16:39:20,995 [INFO] run_eval False
2023-11-15 16:39:20,995 [INFO] conf_thres 0.001
2023-11-15 16:39:20,995 [INFO] iou_thres 0.7
2023-11-15 16:39:20,995 [INFO] conf_free True
2023-11-15 16:39:20,995 [INFO] rect False
2023-11-15 16:39:20,995 [INFO] nms_time_limit 20.0
2023-11-15 16:39:20,995 [INFO] recompute False
2023-11-15 16:39:20,995 [INFO] recompute_layers 0
2023-11-15 16:39:20,995 [INFO] seed 2
2023-11-15 16:39:20,995 [INFO] summary True
2023-11-15 16:39:20,995 [INFO] profiler False
2023-11-15 16:39:20,995 [INFO] profiler_step_num 1
2023-11-15 16:39:20,995 [INFO] opencv_threads_num 0
2023-11-15 16:39:20,995 [INFO] strict_load True
2023-11-15 16:39:20,995 [INFO] enable_modelarts False
2023-11-15 16:39:20,995 [INFO] data_url
2023-11-15 16:39:20,995 [INFO] ckpt_url
2023-11-15 16:39:20,995 [INFO] multi_data_url
2023-11-15 16:39:20,995 [INFO] pretrain_url
2023-11-15 16:39:20,995 [INFO] train_url
2023-11-15 16:39:20,995 [INFO] data_dir /cache/data/
2023-11-15 16:39:20,995 [INFO] ckpt_dir /cache/pretrain_ckpt/
2023-11-15 16:39:20,995 [INFO] data.dataset_name gesture
2023-11-15 16:39:20,995 [INFO] data.train_set /home/ma-user/work/loong/yolo/Rock_paper_scissor_test/train/images
2023-11-15 16:39:20,995 [INFO] data.val_set /home/ma-user/work/loong/yolo/Rock_paper_scissor_test/valid/images
2023-11-15 16:39:20,995 [INFO] data.test_set /home/ma-user/work/loong/yolo/Rock_paper_scissor_test/test/images
2023-11-15 16:39:20,995 [INFO] data.nc 3
2023-11-15 16:39:20,995 [INFO] data.names ['Paper', 'Rock', 'Scissor']
2023-11-15 16:39:20,995 [INFO] roboflow.workspace sambhavs-vision
2023-11-15 16:39:20,995 [INFO] roboflow.project rock-paper-scissor-odf1i
2023-11-15 16:39:20,995 [INFO] roboflow.version 2
2023-11-15 16:39:20,995 [INFO] roboflow.license CC BY 4.0
2023-11-15 16:39:20,995 [INFO] roboflow.url https://universe.roboflow.com/sambhavs-vision/rock-paper-scissor-odf1i/dataset/2
2023-11-15 16:39:20,995 [INFO] data.num_parallel_workers 4
2023-11-15 16:39:20,995 [INFO] train_transforms.stage_epochs [90, 10]
2023-11-15 16:39:20,995 [INFO] train_transforms.trans_list [[{'func_name': 'mosaic', 'prob': 1.0}, {'func_name': 'resample_segments'}, {'func_name': 'random_perspective', 'prob': 1.0, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0}, {'func_name': 'albumentations'}, {'func_name': 'hsv_augment', 'prob': 1.0, 'hgain': 0.015, 'sgain': 0.7, 'vgain': 0.4}, {'func_name': 'fliplr', 'prob': 0.5}, {'func_name': 'label_norm', 'xyxy2xywh_': True}, {'func_name': 'label_pad', 'padding_size': 160, 'padding_value': -1}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}], [{'func_name': 'letterbox', 'scaleup': True}, {'func_name': 'resample_segments'}, {'func_name': 'random_perspective', 'prob': 1.0, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0}, {'func_name': 'albumentations'}, {'func_name': 'hsv_augment', 'prob': 1.0, 'hgain': 0.015, 'sgain': 0.7, 'vgain': 0.4}, {'func_name': 'fliplr', 'prob': 0.5}, {'func_name': 'label_norm', 'xyxy2xywh_': True}, {'func_name': 'label_pad', 'padding_size': 160, 'padding_value': -1}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}]]
2023-11-15 16:39:20,995 [INFO] data.test_transforms [{'func_name': 'letterbox', 'scaleup': False, 'only_image': True}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}]
2023-11-15 16:39:20,995 [INFO] optimizer.optimizer momentum
2023-11-15 16:39:20,995 [INFO] optimizer.lr_init 0.01
2023-11-15 16:39:20,995 [INFO] optimizer.momentum 0.937
2023-11-15 16:39:20,995 [INFO] optimizer.nesterov True
2023-11-15 16:39:20,995 [INFO] optimizer.loss_scale 1.0
2023-11-15 16:39:20,995 [INFO] optimizer.warmup_epochs 3
2023-11-15 16:39:20,995 [INFO] optimizer.warmup_momentum 0.8
2023-11-15 16:39:20,995 [INFO] optimizer.warmup_bias_lr 0.1
2023-11-15 16:39:20,995 [INFO] optimizer.min_warmup_step 1000
2023-11-15 16:39:20,995 [INFO] optimizer.group_param yolov8
2023-11-15 16:39:20,995 [INFO] optimizer.gp_weight_decay 0.0005
2023-11-15 16:39:20,995 [INFO] optimizer.start_factor 1.0
2023-11-15 16:39:20,995 [INFO] optimizer.end_factor 0.01
2023-11-15 16:39:20,995 [INFO] optimizer.epochs 100
2023-11-15 16:39:20,995 [INFO] optimizer.nbs 64
2023-11-15 16:39:20,995 [INFO] optimizer.accumulate 1
2023-11-15 16:39:20,995 [INFO] optimizer.total_batch_size 32
2023-11-15 16:39:20,995 [INFO] loss.name YOLOv8Loss
2023-11-15 16:39:20,995 [INFO] loss.box 7.5
2023-11-15 16:39:20,995 [INFO] loss.cls 0.5
2023-11-15 16:39:20,995 [INFO] loss.dfl 1.5
2023-11-15 16:39:20,995 [INFO] loss.reg_max 16
2023-11-15 16:39:20,995 [INFO] network.model_name yolov8
2023-11-15 16:39:20,995 [INFO] network.reg_max 16
2023-11-15 16:39:20,995 [INFO] network.stride [8, 16, 32]
2023-11-15 16:39:20,995 [INFO] network.backbone [[-1, 1, 'ConvNormAct', [64, 3, 2]], [-1, 1, 'ConvNormAct', [128, 3, 2]], [-1, 3, 'C2f', [128, True]], [-1, 1, 'ConvNormAct', [256, 3, 2]], [-1, 6, 'C2f', [256, True]], [-1, 1, 'ConvNormAct', [512, 3, 2]], [-1, 6, 'C2f', [512, True]], [-1, 1, 'ConvNormAct', [1024, 3, 2]], [-1, 3, 'C2f', [1024, True]], [-1, 1, 'SPPF', [1024, 5]]]
2023-11-15 16:39:20,995 [INFO] network.head [[-1, 1, 'Upsample', ['None', 2, 'nearest']], [[-1, 6], 1, 'Concat', [1]], [-1, 3, 'C2f', [512]], [-1, 1, 'Upsample', ['None', 2, 'nearest']], [[-1, 4], 1, 'Concat', [1]], [-1, 3, 'C2f', [256]], [-1, 1, 'ConvNormAct', [256, 3, 2]], [[-1, 12], 1, 'Concat', [1]], [-1, 3, 'C2f', [512]], [-1, 1, 'ConvNormAct', [512, 3, 2]], [[-1, 9], 1, 'Concat', [1]], [-1, 3, 'C2f', [1024]], [[15, 18, 21], 1, 'YOLOv8Head', ['nc', 'reg_max', 'stride']]]
2023-11-15 16:39:20,995 [INFO] network.depth_multiple 0.33
2023-11-15 16:39:20,995 [INFO] network.width_multiple 0.25
2023-11-15 16:39:20,995 [INFO] network.max_channels 1024
2023-11-15 16:39:20,995 [INFO] config ./configs/yolov8/yolov8n1.yaml
2023-11-15 16:39:20,995 [INFO] rank 0
2023-11-15 16:39:20,995 [INFO] rank_size 2
2023-11-15 16:39:20,995 [INFO] total_batch_size 32
2023-11-15 16:39:20,995 [INFO] callback []
2023-11-15 16:39:20,995 [INFO]
2023-11-15 16:39:20,998 [INFO] Please check the above information for the configurations
2023-11-15 16:39:21,000 [INFO] Parse model with Sync BN.
2023-11-15 16:39:28,676 [WARNING] Parse Model, args: nearest, keep str type
2023-11-15 16:39:29,865 [WARNING] Parse Model, args: nearest, keep str type
2023-11-15 16:39:38,316 [INFO] number of network params, total: 3.021836M, trainable: 3.011417M
2023-11-15 16:40:00,928 [WARNING] Parse Model, args: nearest, keep str type
2023-11-15 16:40:02,104 [WARNING] Parse Model, args: nearest, keep str type
2023-11-15 16:40:10,514 [INFO] number of network params, total: 3.021836M, trainable: 3.011417M
2023-11-15 16:40:26,499 [INFO] ema_weight not exist, default pretrain weight is currently used.
2023-11-15 16:40:26,550 [INFO] Dataset Cache file hash/version check success.
2023-11-15 16:40:26,551 [INFO] Load dataset cache from [/home/ma-user/work/loong/yolo/Rock_paper_scissor_test/train/labels.cache.npy] success.
Scanning '/home/ma-user/work/loong/yolo/Rock_paper_scissor_test/train/labels.cache.npy' images and labels... 1318 found, 0 missing,
2023-11-15 16:40:26,555 [INFO] Dataloader num parallel workers: [4]
2023-11-15 16:40:26,615 [INFO] Dataset Cache file hash/version check success.
2023-11-15 16:40:26,615 [INFO] Load dataset cache from [/home/ma-user/work/loong/yolo/Rock_paper_scissor_test/train/labels.cache.npy] success.
Scanning '/home/ma-user/work/loong/yolo/Rock_paper_scissor_test/train/labels.cache.npy' images and labels... 1318 found, 0 missing,
2023-11-15 16:40:26,618 [INFO] Dataloader num parallel workers: [4]
Scanning '/home/ma-user/work/loong/yolo/Rock_paper_scissor_test/train/labels.cache.npy' images and labels... 1318 found, 0 missing,
Scanning '/home/ma-user/work/loong/yolo/Rock_paper_scissor_test/train/labels.cache.npy' images and labels... 1318 found, 0 missing,
2023-11-15 16:40:32,300 [INFO] Registry(name=callback, total=4)
2023-11-15 16:40:32,300 [INFO] (0): YoloxSwitchTrain in mindyolo/utils/callback.py
2023-11-15 16:40:32,300 [INFO] (1): EvalWhileTrain in mindyolo/utils/callback.py
2023-11-15 16:40:32,300 [INFO] (2): SummaryCallback in mindyolo/utils/callback.py
2023-11-15 16:40:32,300 [INFO] (3): ProfilerCallback in mindyolo/utils/callback.py
2023-11-15 16:40:32,300 [INFO]
2023-11-15 16:40:33,442 [INFO] got 1 active callback as follows:
2023-11-15 16:40:33,443 [INFO] SummaryCallback()
2023-11-15 16:40:33,443 [WARNING] log interval should be less than total steps of one epoch, but got 100 > 41, set log_interval as steps_per_epoch 41
2023-11-15 16:40:33,443 [WARNING] The first epoch will be compiled for the graph, which may take a long time; You can come back later :).
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
[INFO] albumentations load success
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] *** Process received signal ***
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] Signal: Segmentation fault (11)
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] Signal code: Address not mapped (1)
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] Failing at address: 0xb8
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffff982a77c0]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 1] /usr/local/Ascend/ascend-toolkit/latest/lib64/libhcom_graph_adaptor.so(_ZN4hccl22HcomOpsKernelInfoStore19GetCommFromTaskInfoERKN2ge10GETaskInfoERl+0x40)[0xffff555fc5b4]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 2] /usr/local/Ascend/ascend-toolkit/latest/lib64/libhcom_graph_adaptor.so(_ZN4hccl22HcomOpsKernelInfoStore10UnloadTaskERN2ge10GETaskInfoE+0x474)[0xffff55648fa8]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 3] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/plugin/libmindspore_ascend.so.1(+0x30fde90)[0xffff7b011e90]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 4] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/plugin/libmindspore_ascend.so.1(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0xb4)[0xffff7aef1c04]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 5] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/plugin/libmindspore_ascend.so.1(+0x30f466c)[0xffff7b00866c]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 6] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/plugin/libmindspore_ascend.so.1(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0xb4)[0xffff7aef1c04]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 7] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/plugin/libmindspore_ascend.so.1(+0x30f2d84)[0xffff7b006d84]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 8] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/plugin/libmindspore_ascend.so.1(+0x306d0a0)[0xffff7af810a0]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [ 9] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/libmindspore_backend.so(_ZN9mindspore6device20KernelRuntimeManager18ClearGraphResourceEj+0xa0)[0xffff8e362310]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [10] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/libmindspore_backend.so(_ZN9mindspore7session11KernelGraphD1Ev+0xb4)[0xffff8e02caf4]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [11] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0xb4)[0xffff93647a74]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [12] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/lib/libmindspore_backend.so(_ZN9mindspore7session14KernelGraphMgrD2Ev+0x39c)[0xffff8e05fa4c]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [13] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0xb4)[0xffff93647a74]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [14] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0xb4)[0xffff93647a74]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [15] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(_ZN9mindspore7compile17MindRTBackendBaseD1Ev+0x64)[0xffff93f484ec]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [16] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0xb4)[0xffff93647a74]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [17] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(+0x25347b8)[0xffff93e367b8]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [18] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0xb4)[0xffff93647a74]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [19] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(+0x250a78c)[0xffff93e0c78c]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [20] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(+0x250aca0)[0xffff93e0cca0]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [21] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(+0x2520dc4)[0xffff93e22dc4]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [22] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(+0x2500b48)[0xffff93e02b48]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [23] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(+0x1d5f340)[0xffff93661340]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [24] /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/_c_expression.cpython-39-aarch64-linux-gnu.so(+0x1d5bb5c)[0xffff9365db5c]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [25] python(+0x20e3bc)[0xaaaae50f73bc]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [26] python(_PyObject_MakeTpCall+0xa0)[0xaaaae4f66cb0]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [27] python(+0x1f8e18)[0xaaaae50e1e18]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [28] python(_PyEval_EvalFrameDefault+0x5ec0)[0xaaaae4f55f30]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] [29] python(+0x1134c0)[0xaaaae4ffc4c0]
[notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60:3627893] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
[WARNING] ME(3633368:281473346101424,WriterPool-31):2023-11-15-17:02:57.691.745 [mindspore/train/summary/_writer_pool.py:192] The training process 3627893 has exited, summary process will exit.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/utils/multiprocess_util.py", line 60, in run
key, func, args, kwargs = self.task_q.get(timeout=TIMEOUT)
File "<string>", line 2, in get
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/multiprocessing/managers.py", line 810, in _callmethod
kind, result = conn.recv()
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
raise EOFError
EOFError
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node notebook-2a7fcf5e-9744-41c7-9c1c-5e37eeafdf60 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
(MindSpore) [ma-user mindyolo]$echo $DEVICE_ID0,1
'''
@zhanghuiyao 另外这边mindscope版本2.0.0,昇腾910b,是否是兼容问题导致?
怎么样?什么问题
建议先使用python -c "import mindspore;mindspore.set_context(device_target='Ascend');mindspore.run_check()"检查mindspore是否正常安装
问题单先关闭,如仍遇到问题可以提交新的issue或更改issue状态并提供相应信息
mpirun --allow-run-as-root -n 2 python train.py --config ./configs/yolov8/yolov8n1.yaml --is_parallel True