mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

AttributeError: 'Namespace' object has no attribute 'disable_allreduce_for_logging' #542

Closed dimanzt closed 1 year ago

dimanzt commented 2 years ago

Hi all, I'm trying to run object_detection, and created the containerized image but when trying to run the code I'm getting into the following issue, does it mean that the code did not run correctly? How should I know if it is running correctly or not?

/workspace# ./run_and_time.sh /workspace/pytorch /workspace :::MLLOG {"namespace": "", "time_ms": 1648590976687, "event_type": "INTERVAL_START", "key": "init_start", "value": null, "metadata": {"file": "tools/train_mlperf.py", "lineno": 216}} :::MLLOG {"namespace": "", "time_ms": 1648590976748, "event_type": "POINT_IN_TIME", "key": "seed", "value": 1847111940, "metadata": {"file": "tools/train_mlperf.py", "lineno": 263}} 2022-03-29 21:56:16,760 maskrcnn_benchmark INFO: Using 1 GPUs 2022-03-29 21:56:16,760 maskrcnn_benchmark INFO: Namespace(config_file='configs/e2e_mask_rcnn_R_50_FPN_1x.yaml', distributed=False, local_rank=0, opts=['SOLVER.IMS_PER_BATCH', '2', 'TEST.IMS_PER_BATCH', '10', 'SOLVER.MAX_ITER', '720000', 'SOLVER.STEPS', '(480000, 640000)', 'SOLVER.BASE_LR', '0.0025'], seed=1847111940) 2022-03-29 21:56:16,760 maskrcnn_benchmark INFO: Worker 0: Setting seed 3032881680 2022-03-29 21:56:16,761 maskrcnn_benchmark INFO: Collecting env info (might take some time) 2022-03-29 21:56:18,722 maskrcnn_benchmark INFO: PyTorch version: 1.6.0 Is debug build: No CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 CMake version: Could not collect

Python version: 3.7 Is CUDA available: Yes CUDA runtime version: 10.1.243 GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 510.47.03 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries: [pip] numpy==1.18.5 [pip] torch==1.6.0 [pip] torchvision==0.2.2 [conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2020.1 217
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.1.0 py37h23d657b_0
[conda] mkl_random 1.1.1 py37h0573a6f_0
[conda] numpy 1.18.5 py37ha1c710e_0
[conda] numpy-base 1.18.5 py37hde5b4d6_0
[conda] pytorch 1.6.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch [conda] torchvision 0.2.2 pypi_0 pypi Pillow (7.2.0) 2022-03-29 21:56:18,722 maskrcnn_benchmark INFO: Loaded configuration file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml 2022-03-29 21:56:18,722 maskrcnn_benchmark INFO: MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50" BACKBONE: CONV_BODY: "R-50-FPN" OUT_CHANNELS: 256 RPN: USE_FPN: True ANCHOR_STRIDE: (4, 8, 16, 32, 64) PRE_NMS_TOP_N_TRAIN: 2000 PRE_NMS_TOP_N_TEST: 1000 POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_TOP_N_TEST: 1000 ROI_HEADS: USE_FPN: True ROI_BOX_HEAD: POOLER_RESOLUTION: 7 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) POOLER_SAMPLING_RATIO: 2 FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor" PREDICTOR: "FPNPredictor" ROI_MASK_HEAD: POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor" PREDICTOR: "MaskRCNNC4Predictor" POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 2 RESOLUTION: 28 SHARE_BOX_FEATURE_EXTRACTOR: False MASK_ON: True DATASETS: TRAIN: ("coco_2017_train",) TEST: ("coco_2017_val",) DATALOADER: SIZE_DIVISIBILITY: 32 SOLVER: BASE_LR: 0.02 WEIGHT_DECAY: 0.0001 STEPS: (60000, 80000) MAX_ITER: 90000

2022-03-29 21:56:18,723 maskrcnn_benchmark INFO: Running with config: DATALOADER: ASPECT_RATIO_GROUPING: True NUM_WORKERS: 4 SIZE_DIVISIBILITY: 32 DATASETS: TEST: ('coco_2017_val',) TRAIN: ('coco_2017_train',) INPUT: MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN: (800,) PIXEL_MEAN: [102.9801, 115.9465, 122.7717] PIXEL_STD: [1.0, 1.0, 1.0] TO_BGR255: True MLPERF: MIN_BBOX_MAP: 0.377 MIN_SEGM_MAP: 0.339 MODEL: BACKBONE: CONV_BODY: R-50-FPN FREEZE_CONV_BODY_AT: 2 OUT_CHANNELS: 256 USE_GN: False CLS_AGNOSTIC_BBOX_REG: False DEVICE: cuda FPN: USE_GN: False USE_RELU: False GROUP_NORM: DIM_PER_GP: -1 EPSILON: 1e-05 NUM_GROUPS: 32 KEYPOINT_ON: False MASK_ON: True META_ARCHITECTURE: GeneralizedRCNN RESNETS: NUM_GROUPS: 1 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_FUNC: StemWithFixedBatchNorm STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: True TRANS_FUNC: BottleneckWithFixedBatchNorm WIDTH_PER_GROUP: 64 RETINANET: ANCHOR_SIZES: (32, 64, 128, 256, 512) ANCHOR_STRIDES: (8, 16, 32, 64, 128) ASPECT_RATIOS: (0.5, 1.0, 2.0) BBOX_REG_BETA: 0.11 BBOX_REG_WEIGHT: 4.0 BG_IOU_THRESHOLD: 0.4 FG_IOU_THRESHOLD: 0.5 INFERENCE_TH: 0.05 LOSS_ALPHA: 0.25 LOSS_GAMMA: 2.0 NMS_TH: 0.4 NUM_CLASSES: 81 NUM_CONVS: 4 OCTAVE: 2.0 PRE_NMS_TOP_N: 1000 PRIOR_PROB: 0.01 SCALES_PER_OCTAVE: 3 STRADDLE_THRESH: 0 USE_C5: True RETINANET_ON: False ROI_BOX_HEAD: CONV_HEAD_DIM: 256 DILATION: 1 FEATURE_EXTRACTOR: FPN2MLPFeatureExtractor MLP_HEAD_DIM: 1024 NUM_CLASSES: 81 NUM_STACKED_CONVS: 4 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 2 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) PREDICTOR: FPNPredictor USE_GN: False ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0) BG_IOU_THRESHOLD: 0.5 DETECTIONS_PER_IMG: 100 FG_IOU_THRESHOLD: 0.5 NMS: 0.5 POSITIVE_FRACTION: 0.25 SCORE_THRESH: 0.05 USE_FPN: True ROI_KEYPOINT_HEAD: CONV_LAYERS: (512, 512, 512, 512, 512, 512, 512, 512) FEATURE_EXTRACTOR: KeypointRCNNFeatureExtractor MLP_HEAD_DIM: 1024 NUM_CLASSES: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_SCALES: (0.0625,) PREDICTOR: KeypointRCNNPredictor RESOLUTION: 14 SHARE_BOX_FEATURE_EXTRACTOR: True ROI_MASK_HEAD: CONV_LAYERS: (256, 256, 256, 256) DILATION: 1 FEATURE_EXTRACTOR: MaskRCNNFPNFeatureExtractor MLP_HEAD_DIM: 1024 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 2 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) POSTPROCESS_MASKS: False POSTPROCESS_MASKS_THRESHOLD: 0.5 PREDICTOR: MaskRCNNC4Predictor RESOLUTION: 28 SHARE_BOX_FEATURE_EXTRACTOR: False USE_GN: False RPN: ANCHOR_SIZES: (32, 64, 128, 256, 512) ANCHOR_STRIDE: (4, 8, 16, 32, 64) ASPECT_RATIOS: (0.5, 1.0, 2.0) BATCH_SIZE_PER_IMAGE: 256 BG_IOU_THRESHOLD: 0.3 FG_IOU_THRESHOLD: 0.7 FPN_POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_TOP_N_TRAIN: 2000 MIN_SIZE: 0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOP_N_TEST: 1000 POST_NMS_TOP_N_TRAIN: 2000 PRE_NMS_TOP_N_TEST: 1000 PRE_NMS_TOP_N_TRAIN: 2000 RPN_HEAD: SingleConvRPNHead STRADDLE_THRESH: 0 USE_FPN: True RPN_ONLY: False WEIGHT: catalog://ImageNetPretrained/MSRA/R-50 OUTPUT_DIR: . PATHS_CATALOG: /workspace/pytorch/maskrcnn_benchmark/config/paths_catalog.py PER_EPOCH_EVAL: True SAVE_CHECKPOINTS: False SOLVER: BASE_LR: 0.0025 BIAS_LR_FACTOR: 2 CHECKPOINT_PERIOD: 2500 GAMMA: 0.1 IMS_PER_BATCH: 2 MAX_ITER: 720000 MOMENTUM: 0.9 STEPS: (480000, 640000) WARMUP_FACTOR: 0.3333333333333333 WARMUP_ITERS: 500 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0 TEST: DETECTIONS_PER_IMG: 100 EXPECTED_RESULTS: [] EXPECTED_RESULTS_SIGMA_TOL: 4 IMS_PER_BATCH: 10 Traceback (most recent call last): File "tools/train_mlperf.py", line 309, in main() File "tools/train_mlperf.py", line 298, in main model, success = train(cfg, args.local_rank, args.distributed, args.disable_allreduce_for_logging, random_number_generator) AttributeError: 'Namespace' object has no attribute 'disable_allreduce_for_logging'

real 0m2.993s user 0m2.218s sys 0m0.728s /workspace

UberIzual commented 1 year ago

same error

FreyaDingDing commented 1 year ago

any update here?

archlitchi commented 1 year ago

same error + 1

johntran-nv commented 1 year ago

We believe this should be fixed with https://github.com/mlcommons/training/pull/556. We'll try to get that merged ASAP. [edit: fixed link]

yuanzhedong commented 1 year ago

We believe this should be fixed with #542. We'll try to get that merged ASAP.

Here's the right link https://github.com/mlcommons/training/pull/556

johntran-nv commented 1 year ago

Closing as I believe this is fixed with the PR above.