open-mmlab / mmtracking

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
https://mmtracking.readthedocs.io/en/latest/
Apache License 2.0
3.56k stars 598 forks source link

"TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'" when evaluate CLEAR MOT results. #500

Open AndrewGuo0930 opened 2 years ago

AndrewGuo0930 commented 2 years ago

Describe the bug After training for 1 epoch on my custom datasets with QDTrack, it started to evaluate CLEAR MOT results. But an error occured "TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'".

Reproduction

1. What command or script did you run?

PORT=29504 ./tools/dist_train.sh ./configs/mot/qdtrack/qdtrack_faster-rcnn_dcnv2_r50_fpn_4e_sat-airplane.py 1

2. Did you make any modifications on the code or config? Did you understand what you have modified? Here is my config.

img_scale = (512, 512)
fp16 = dict(loss_scale='dynamic')
classes = ('airplane', )

model = dict(
    type='QDTrack',
    # freeze_detector=True,
    detector=dict(
        type='FasterRCNN',
        backbone=dict(
            type='ResNet',
            depth=50,
            num_stages=4,
            out_indices=(0, 1, 2, 3),
            frozen_stages=1,
            norm_cfg=dict(type='BN', requires_grad=True),
            norm_eval=True,
            style='pytorch',
            init_cfg=dict(type='Pretrained', checkpoint='./checkpoints/resnet50-19c8e357.pth'),
            dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
            stage_with_dcn=(False, True, True, True)),
        neck=dict(
            type='FPN',
            in_channels=[256, 512, 1024, 2048],
            out_channels=256,
            num_outs=5),
        rpn_head=dict(
            type='RPNHead',
            in_channels=256,
            feat_channels=256,
            anchor_generator=dict(
                type='AnchorGenerator',
                scales=[2],
                ratios=[0.5, 1.0, 2.0],
                strides=[4, 8, 16, 32, 64]),
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[1.0, 1.0, 1.0, 1.0]),
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        roi_head=dict(
            type='StandardRoIHead',
            bbox_roi_extractor=dict(
                type='SingleRoIExtractor',
                roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
                out_channels=256,
                featmap_strides=[4, 8, 16, 32]),
            bbox_head=dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=4,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                loss_cls=dict(
                    type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
                loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
        train_cfg=dict(
            rpn=dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.3,
                    min_pos_iou=0.3,
                    match_low_quality=True,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=256,
                    pos_fraction=0.5,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=False),
                allowed_border=-1,
                pos_weight=-1,
                debug=False),
            rpn_proposal=dict(
                nms_pre=2000,
                max_per_img=1000,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False)),
        test_cfg=dict(
            rpn=dict(
                nms_pre=1000,
                max_per_img=1000,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=dict(
                score_thr=0.05,
                nms=dict(type='nms', iou_threshold=0.5),
                max_per_img=150)),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=
            './checkpoints/faster_rcnn_r50_fpn_mdconv_c3-c5_1x_sat_epoch5_20220411.pth'
        )),
    track_head=dict(
        type='QuasiDenseTrackHead',
        roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        embed_head=dict(
            type='QuasiDenseEmbedHead',
            num_convs=4,
            num_fcs=1,
            embed_channels=256,
            norm_cfg=dict(type='GN', num_groups=32),
            loss_track=dict(type='MultiPosCrossEntropyLoss', loss_weight=0.25),
            loss_track_aux=dict(
                type='L2Loss',
                neg_pos_ub=3,
                pos_margin=0,
                neg_margin=0.1,
                hard_mining=True,
                loss_weight=1.0)),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0),
        train_cfg=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                ignore_iof_thr=-1),
            sampler=dict(
                type='CombinedSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=3,
                add_gt_as_proposals=True,
                pos_sampler=dict(type='InstanceBalancedPosSampler'),
                neg_sampler=dict(type='RandomSampler')))),
    tracker=dict(
        type='QuasiDenseEmbedTracker',
        init_score_thr=0.9,
        # obj_score_thr=0.5,
        # match_score_thr=0.5,
        obj_score_thr=0.2,
        match_score_thr=0.2,
        memo_tracklet_frames=30,
        memo_backdrop_frames=1,
        memo_momentum=0.8,
        # nms_conf_thr=0.5,
        nms_conf_thr=0.2,
        nms_backdrop_iou_thr=0.3,
        nms_class_iou_thr=0.7,
        with_cats=True,
        match_metric='bisoftmax'))

dataset_type = 'MOTChallengeDataset'
img_norm_cfg = dict(
    mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
    dict(type='LoadMultiImagesFromFile', to_float32=True),
    dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True),
    dict(
        type='SeqResize',
        img_scale=img_scale,
        share_params=True,
        ratio_range=(0.8, 1.2),
        keep_ratio=True,
        bbox_clip_border=False),
    dict(type='SeqPhotoMetricDistortion', share_params=True),
    dict(
        type='SeqRandomCrop',
        share_params=False,
        crop_size=img_scale,
        bbox_clip_border=False),
    dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5),
    dict(
        type='SeqNormalize',
        mean=[103.53, 116.28, 123.675],
        std=[1.0, 1.0, 1.0],
        to_rgb=False),
    dict(type='SeqPad', size_divisor=32),
    dict(type='MatchInstances', skip_nomatch=True),
    dict(
        type='VideoCollect',
        keys=[
            'img', 'gt_bboxes', 'gt_labels', 'gt_match_indices',
            'gt_instance_ids'
        ]),
    dict(type='SeqDefaultFormatBundle', ref_prefix='ref')
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=img_scale,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[103.53, 116.28, 123.675],
                std=[1.0, 1.0, 1.0],
                to_rgb=False),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='VideoCollect', keys=['img'])
        ])
]

data_root = '../datasets/VISO/Track3'
data = dict(
    # samples_per_gpu=2,
    # workers_per_gpu=2,
    samples_per_gpu=1,
    workers_per_gpu=1,
    train=dict(
        type=dataset_type,
        visibility_thr=-1,
        ann_file='../datasets/VISO/Track3/train_cocoformat.json',
        img_prefix='../datasets/VISO/Track3/training_data',
        classes=classes,
        ref_img_sampler=dict(
            num_ref_imgs=1,
            frame_range=10,
            filter_key_img=True,
            method='uniform'),
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file='../datasets/VISO/Track3/val_cocoformat.json',
        img_prefix='../datasets/VISO/Track3/validation_data',
        classes=classes,
        ref_img_sampler=None,
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file='../datasets/VISO/Track3/val_cocoformat.json',
        img_prefix='../datasets/VISO/Track3/validation_data',
        classes=classes,
        ref_img_sampler=None,
        pipeline=test_pipeline))

optimizer = dict(type='SGD', lr=5e-5, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
lr_config = dict(policy='step', step=[3])
total_epochs = 4
evaluation = dict(metric=['bbox', 'track'], interval=1)
work_dir = './work_dirs/qdtrack_faster-rcnn_dcnv2_r50_fpn_4e_sat_scale2_airplane_justclass'

3. What dataset did you use and what task did you run? A dataset provided by a competition, which has 4 classes 'car', 'airplane', 'ship', 'train'. I'm working on multi-class MOT task.

Environment

1. Please run python mmtrack/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux
Python: 3.7.0 | packaged by conda-forge | (default, Nov 12 2018, 20:15:55) [GCC 7.3.0]
CUDA available: False
GCC: gcc (GCC) 5.4.0
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.2
OpenCV: 4.5.5
MMCV: 1.4.5
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.1
MMTracking: 0.12.0+11e2d84

2. You may add addition that may be helpful for locating the problem, such as My dataset has 4 classes. However, since MMTracking only supports single-class MOT now, I'm trying to train 4 models for the 4 classes individually. Can I just change num_classes in bbox_head and classes=('xxx', ) in data? Or I should generate a dataset that only contains the class I focus on? Thank you so much!

Error traceback If applicable, paste the error trackback here.

Traceback (most recent call last):
  File "./tools/train.py", line 210, in <module>
    main()
  File "./tools/train.py", line 206, in main
    meta=meta)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmtracking/mmtrack/apis/train.py", line 175, in train_model
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmcv/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/cluster/home/it_stu12/main/SatVideoDT/mmcv/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmcv/mmcv/runner/hooks/evaluation.py", line 267, in after_train_epoch
    self._do_evaluate(runner)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmtracking/mmtrack/core/evaluation/eval_hooks.py", line 62, in _do_evaluate
    key_score = self.evaluate(runner, results)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmcv/mmcv/runner/hooks/evaluation.py", line 362, in evaluate
    results, logger=runner.logger, **self.eval_kwargs)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmtracking/mmtrack/datasets/mot_challenge_dataset.py", line 418, in evaluate
    dataset = [trackeval.datasets.MotChallenge2DBox(dataset_config)]
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/trackeval/datasets/mot_challenge_2d_box.py", line 50, in __init__
    gt_set = self.config['BENCHMARK'] + '-' + self.config['SPLIT_TO_EVAL']
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/tempfile.py:796: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp_mfoou0f'>
  _warnings.warn(warn_message, ResourceWarning)
Traceback (most recent call last):
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/cluster/home/it_stu12/.conda/envs/SatVideoDT/bin/python', '-u', './tools/train.py', '--local_rank=0', './configs/mot/qdtrack/qdtrack_faster-rcnn_dcnv2_r50_fpn_4e_sat-airplane.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Eval config

Eval Config:
USE_PARALLEL         : False                         
NUM_PARALLEL_CORES   : 8                             
BREAK_ON_ERROR       : True                          
RETURN_ON_ERROR      : False                         
LOG_ON_ERROR         : /cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/error_log.txt
PRINT_RESULTS        : True                          
PRINT_ONLY_COMBINED  : False                         
PRINT_CONFIG         : True                          
TIME_PROGRESS        : True                          
DISPLAY_LESS_PROGRESS : True                          
OUTPUT_SUMMARY       : True                          
OUTPUT_EMPTY_CLASSES : True                          
OUTPUT_DETAILED      : True                          
PLOT_CURVES          : True                          

MotChallenge2DBox Config:
GT_FOLDER            : ../datasets/VISO/Track3/validation_data
TRACKERS_FOLDER      : /tmp/tmp_mfoou0f              
OUTPUT_FOLDER        : None                          
TRACKERS_TO_EVAL     : ['track']                     
CLASSES_TO_EVAL      : ['airplane']                  
BENCHMARK            : None                          
SPLIT_TO_EVAL        : train                         
INPUT_AS_ZIP         : False                         
PRINT_CONFIG         : True                          
DO_PREPROC           : True                          
TRACKER_SUB_FOLDER   :                               
OUTPUT_SUB_FOLDER    :                               
TRACKER_DISPLAY_NAMES : None                          
SEQMAP_FOLDER        : None                          
SEQMAP_FILE          : /tmp/tmp_mfoou0f/videoseq.txt 
SEQ_INFO             : None                          
GT_LOC_FORMAT        : {gt_folder}/{seq}/gt/gt.txt   
SKIP_SPLIT_FOL       : True                    
Seerkfang commented 2 years ago

Yes, for customed usage, you can change the classes in bbox_head and classes.

As for your problem, the MOTChallengeDataset has some hard code, which only supports name rules for MOT series datasets (this is because of the third-party evaluation toolkit Trackeval for HOTA). So there is something more you need to modify in the MOTChallengeDataset. If you don't care about HOTA value, I suggest firstly ignoring those codes.

AndrewGuo0930 commented 2 years ago

Thank you for your response! So which code should I modify? Can I just modify the code in ./mmtrack/datasets/mot_challenge_dataset.py? Another question, when I'm running the training script using PORT=29504 ./tools/dist_train.sh ./configs/mot/qdtrack/qdtrack_faster-rcnn_dcnv2_r50_fpn_4e_sat-airplane.py 1, is it training both the detector and the tracker, or just the tracker? Finally, I would appreciate it if you could help me with #499 and #491. Thank you so much!

noahcao commented 2 years ago

@AndrewGuo0930 To evaluate on multiple classes, you have to customize the CLASS_TO_EVAL in TrackEval as here. The corresponding line to edit is in the mmtrack/datasets/mot_challenge_dataset.py as here.

For other problems, please initialize new issues with proper titles for discussion.