hsiangyuzhao / RCPS

official implementation of rectified contrastive pseudo supervision
MIT License
57 stars 3 forks source link

代码运行报错 #14

Closed yyyyy-aa closed 4 months ago

yyyyy-aa commented 5 months ago

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py --mixed --benchmark --task la --exp_name running --wandb --entity xxx /usr/lib/python3/dist-packages/requests/init.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " | distributed init (rank 0): env://

Semi-Supervised Medical Image Segmentation Training Mixed Precision - True; CUDNN Benchmark - True; Num GPU - 1; Num Worker - 8 successfully loaded config file: {'MODEL': {'PROJECT_DIM': 64, 'LEAKY': True, 'NORM': 'BATCH'}, 'TRAIN': {'LR': 0.01, 'MOMENTUM': 0.9, 'DECAY': 0.0001, 'BURN_IN': 5, 'BURN': 0, 'RAMPUP': 100, 'EPOCHS': 100, 'BATCHSIZE': 1, 'SEED': 42, 'RATIO': 0.1, 'LOSS_TYPE': 1, 'SAMPLE_NUM': 400, 'BUFFER_SIZE': 1, 'CPS_RATIO': 0.1, 'CON_RATIO': 0.1}, 'TEST': {'BATCHSIZE': 4}} Traceback (most recent call last): File "/home/chaijingwen/RCPS-main/train.py", line 184, in main() File "/home/chaijingwen/RCPS-main/train.py", line 74, in main AddChanneld(keys=['image', 'label'], allow_missing_keys=True), NameError: name 'AddChanneld' is not defined ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1208857) of binary: /usr/bin/python3 Traceback (most recent call last): File "/home/ccj/.local/bin/torchrun", line 8, in sys.exit(main()) File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-04-22_10:17:25 host : mvp-C621-WD12-IPMI rank : 0 (local_rank: 0) exitcode : 1 (pid: 1208857) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ 您好,我在运行train.py时出现了以上的报错,可以请您帮忙看下是什么问题吗
hsiangyuzhao commented 5 months ago

It seems that latest MONAI removes the API "AddChanneld", try "EnsureChannelFirstd" instead.