open-mmlab / mmtracking

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
https://mmtracking.readthedocs.io/en/latest/
Apache License 2.0
3.5k stars 587 forks source link

while training detection model on mot 20 recieved error configdict object has no attribute device #540

Open sparshgarg23 opened 2 years ago

sparshgarg23 commented 2 years ago

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.yes the issue was raised in mmdetection but in mmtrack the issue still persists
  2. The bug has not been fixed in the latest version. The bug has not been fixed Describe the bug While training faster rcnn fpn 50 on MOT20 using the configuration file
    faster-rcnn_r50_fpn_8e_mot20-half.py

    I ran into the following error

    
    /content/mmtracking/mmtrack/core/utils/misc.py:25: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
    f'Setting OMP_NUM_THREADS environment variable for each process '
    /content/mmtracking/mmtrack/core/utils/misc.py:35: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
    f'Setting MKL_NUM_THREADS environment variable for each process '
    2022-05-02 11:26:04,230 - mmtrack - INFO - Environment info:
    ------------------------------------------------------------
    sys.platform: linux
    Python: 3.7.13 (default, Apr 24 2022, 01:04:09) [GCC 7.5.0]
    CUDA available: True
    GPU 0: Tesla T4
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.1, V11.1.105
    GCC: x86_64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.11.0+cu113
    PyTorch compiling details: PyTorch built with:
    - GCC 7.3
    - C++ Version: 201402
    - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
    - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
    - OpenMP 201511 (a.k.a. OpenMP 4.5)
    - LAPACK is enabled (usually provided by MKL)
    - NNPACK is enabled
    - CPU capability usage: AVX2
    - CUDA Runtime 11.3
    - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
    - CuDNN 8.2
    - Magma 2.5.2
    - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.12.0+cu113 OpenCV: 4.1.2 MMCV: 1.5.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMTracking: 0.13.0+88f92dd

2022-05-02 11:26:04,231 - mmtrack - INFO - Distributed training: False 2022-05-02 11:26:04,890 - mmtrack - INFO - Config: model = dict( type='FasterRCNN', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0], clip_border=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2], clip_border=True), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)), init_cfg=dict( type='Pretrained', checkpoint= 'http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth' )) dataset_type = 'CocoDataset' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile', to_float32=True), dict(type='LoadAnnotations', with_bbox=True), dict( type='Resize', img_scale=(1088, 1088), ratio_range=(0.8, 1.2), keep_ratio=True, bbox_clip_border=False), dict(type='PhotoMetricDistortion'), dict(type='RandomCrop', crop_size=(1088, 1088), bbox_clip_border=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1088, 1088), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data_root = 'data/MOT20/' data = dict( samples_per_gpu=8, workers_per_gpu=8, train=dict( type='CocoDataset', ann_file='data/MOT20/annotations/half-train_cocoformat.json', img_prefix='data/MOT20/train', classes=('pedestrian', ), pipeline=[ dict(type='LoadImageFromFile', to_float32=True), dict(type='LoadAnnotations', with_bbox=True), dict( type='Resize', img_scale=(1088, 1088), ratio_range=(0.8, 1.2), keep_ratio=True, bbox_clip_border=False), dict(type='PhotoMetricDistortion'), dict( type='RandomCrop', crop_size=(1088, 1088), bbox_clip_border=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='CocoDataset', ann_file='data/MOT20/annotations/half-val_cocoformat.json', img_prefix='data/MOT20/train', classes=('pedestrian', ), pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1088, 1088), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file='data/MOT20/annotations/half-val_cocoformat.json', img_prefix='data/MOT20/train', classes=('pedestrian', ), pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1088, 1088), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(metric=['bbox']) optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' USE_MMDET = True lr_config = dict( policy='step', warmup='linear', warmup_iters=100, warmup_ratio=0.01, step=[6]) total_epochs = 8 work_dir = './work_dirs/faster-rcnn_r50_fpn_8e_mot20-half' gpu_ids = [0]

2022-05-02 11:26:04,933 - mmtrack - INFO - Set random seed to 849712315, deterministic: False 2022-05-02 11:26:05,370 - mmtrack - INFO - initialize FasterRCNN with init_cfg {'type': 'Pretrained', 'checkpoint': 'http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth'} 2022-05-02 11:26:05,370 - mmcv - INFO - load model from: http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth 2022-05-02 11:26:05,371 - mmcv - INFO - load checkpoint from http path: http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth Downloading: "http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth" to /root/.cache/torch/hub/checkpoints/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth 100% 160M/160M [00:16<00:00, 10.4MB/s] 2022-05-02 11:26:22,426 - mmcv - WARNING - The model and loaded state dict do not match exactly

size mismatch for roi_head.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 1024]) from checkpoint, the shape in current model is torch.Size([2, 1024]). size mismatch for roi_head.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for roi_head.bbox_head.fc_reg.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([4, 1024]). size mismatch for roi_head.bbox_head.fc_reg.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([4]). loading annotations into memory... Done (t=3.20s) creating index... index created! /usr/local/lib/python3.7/dist-packages/mmdet/utils/compat_config.py:30: UserWarning: config is now expected to have a runner section, please set runner in your config. 'please set runner in your config.', UserWarning) /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) Traceback (most recent call last): File "tools/train.py", line 210, in main() File "tools/train.py", line 206, in main meta=meta) File "/usr/local/lib/python3.7/dist-packages/mmdet/apis/train.py", line 163, in train_detector model = build_dp(model, cfg.device, device_ids=cfg.gpu_ids) File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/config.py", line 513, in getattr return getattr(self._cfg_dict, name) File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/config.py", line 49, in getattr raise ex AttributeError: 'ConfigDict' object has no attribute 'device'


**Reproduction**

1. What command or script did you run?
  1. Execute !python ./tools/convert_datasets/mot/mot2coco.py -i ./data/MOT20/ -o ./data/MOT20/annotations --split-train --convert-det

  2. Convert mot2reid using !python ./tools/convert_datasets/mot/mot2reid.py -i ./data/MOT20/ -o ./data/MOT20/reid --val-split 0.2 --vis-threshold 0.3 3.Train using !python tools/train.py /content/mmtracking/configs/det/faster-rcnn_r50_fpn_8e_mot20-half.py

  3. Did you make any modifications on the code or config? Did you understand what you have modified? only changed number of samples_per_gpu from 2 to 8 in datasets/mot_challenge_det.py

  4. What dataset did you use and what task did you run? MOT20 multiobject tracking first train model for detection as mentioned in quick_run.md Environment Environment information

    
    sys.platform: linux
    Python: 3.7.13 (default, Apr 24 2022, 01:04:09) [GCC 7.5.0]
    CUDA available: True
    GPU 0: Tesla T4
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.1, V11.1.105
    GCC: x86_64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.11.0+cu113
    PyTorch compiling details: PyTorch built with:
    - GCC 7.3
    - C++ Version: 201402
    - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
    - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
    - OpenMP 201511 (a.k.a. OpenMP 4.5)
    - LAPACK is enabled (usually provided by MKL)
    - NNPACK is enabled
    - CPU capability usage: AVX2
    - CUDA Runtime 11.3
    - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
    - CuDNN 8.2
    - Magma 2.5.2

TorchVision: 0.12.0+cu113 OpenCV: 4.1.2 MMCV: 1.5.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMTracking: 0.13.0+88f92dd



**Bug fix**
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
It seems that the same error was reported in MMdetection issue no 7848
but that is not applicable here because in train.py there is no mention of cfg.device
sparshgarg23 commented 2 years ago

Wanted to check if this is an issue with mmdetection or with mmtrack?

dyhBUPT commented 2 years ago

Hi, this would be caused by the update of the mmdet. You can try to add the cfg.device in the train.py as in https://github.com/open-mmlab/mmdetection/blob/master/tools/train.py#L196

Please let me know if this helps.

sparshgarg23 commented 2 years ago

Was able to proceed with training after making the suggested changes,but it seems that the training process is currently stuck at this stage

Loading and preparing results...
DONE (t=1.68s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*

Any reason why this is happening?

sparshgarg23 commented 2 years ago

Was able to resolve the previous issue,it seems that evaluation of annotation type bbox takes 21 minutes per epoch. I am currently training the reid model on MOT20,and it seems that for each epoch it's processing about 76139 batches. Since I am training on colab,is there a way to speed up the process.Increasing the batch size beyond 8 results in cuda out of memory error

dyhBUPT commented 2 years ago

It seems that the speed is limited by the GPU memory in your case.

sparshgarg23 commented 2 years ago

well that would make sense.I am working on colab so I just have access to one GPu @dyhBUPT could you tell me how to go about fixing #542 . I was able to obtain the demo results by using the pretrained detection and reid models for traktor. But when I used my own detection model I ended up getting key error detections when I tried to do an inference