open-mmlab / mmtracking

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
https://mmtracking.readthedocs.io/en/latest/
Apache License 2.0
3.52k stars 588 forks source link

Problem met when testing #459

Open AndrewGuo0930 opened 2 years ago

AndrewGuo0930 commented 2 years ago

Describe the bug

Reproduction

  1. What command or script did you run?
PORT=29514 ./tools/dist_test.sh configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py 1 --checkpoint work_dirs/bytetrack_yolox_s_512_alltrain_sat/epoch_80.pth --out results.pkl --eval bbox track 
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  1. What dataset did you use and what task did you run?

Environment

sys.platform: linux
Python: 3.7.0 | packaged by conda-forge | (default, Nov 12 2018, 20:15:55) [GCC 7.3.0]
CUDA available: False
GCC: gcc (GCC) 5.4.0
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.2
OpenCV: 4.5.5
MMCV: 1.4.5
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.1
MMTracking: 0.10.0+

Error traceback

Traceback (most recent call last):
  File "./tools/test.py", line 224, in <module>
    main()
  File "./tools/test.py", line 214, in main
    metric = dataset.evaluate(outputs, **eval_kwargs)
  File "/cluster/home/it_stu12/main/SatVideoDT/mmdetection/mmdet/datasets/dataset_wrappers.py", line 108, in evaluate
    ('Dataset and results have different sizes: '
AssertionError: Dataset and results have different sizes: 3724 v.s. 2
Traceback (most recent call last):
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/cluster/home/it_stu12/.conda/envs/SatVideoDT/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/cluster/home/it_stu12/.conda/envs/SatVideoDT/bin/python', '-u', './tools/test.py', '--local_rank=0', 'configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py', '--launcher', 'pytorch', '--checkpoint', 'work_dirs/bytetrack_yolox_s_512_alltrain_sat/epoch_80.pth', '--out', 'results.pkl', '--eval', 'bbox', 'track']' returned non-zero exit status 1.
AndrewGuo0930 commented 2 years ago

Here's my config bytetrack_yolox_s_512_alltrain_sat.py.

_base_ = [
    '../../_base_/datasets/mot_challenge.py', '../../_base_/default_runtime.py'
]

img_scale = (512, 512)
samples_per_gpu = 4

model = dict(
    type='ByteTrack',
    detector=dict(
        type='YOLOX',
        input_size=img_scale,
        random_size_range=(18, 32),
        random_size_interval=10,
        backbone=dict(
            type='CSPDarknet', deepen_factor=0.33, widen_factor=0.5),
        neck=dict(
            type='YOLOXPAFPN',
            in_channels=[128, 256, 512],
            out_channels=128,
            num_csp_blocks=1),
        bbox_head=dict(
            type='YOLOXHead',
            num_classes=4,
            in_channels=128,
            feat_channels=128),
        train_cfg=dict(
            assigner=dict(type='SimOTAAssigner', center_radius=2.5)),
        test_cfg=dict(
            score_thr=0.01, nms=dict(type='nms', iou_threshold=0.7)),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=  # noqa: E251
            '/cluster/home/it_stu12/main/SatVideoDT/mmtracking/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth'  # noqa: E501
        )),
    motion=dict(type='KalmanFilter'),
    tracker=dict(
        type='ByteTracker',
        obj_score_thrs=dict(high=0.6, low=0.1),
        init_track_thr=0.7,
        weight_iou_with_det_scores=True,
        match_iou_thrs=dict(high=0.1, low=0.5, tentative=0.3),
        num_frames_retain=30))

train_pipeline = [
    dict(
        type='Mosaic',
        img_scale=img_scale,
        pad_val=114.0,
        bbox_clip_border=False),
    dict(
        type='RandomAffine',
        scaling_ratio_range=(0.1, 2),
        border=(-img_scale[0] // 2, -img_scale[1] // 2),
        bbox_clip_border=False),
    dict(
        type='MixUp',
        img_scale=img_scale,
        ratio_range=(0.8, 1.6),
        pad_val=114.0,
        bbox_clip_border=False),
    dict(type='YOLOXHSVRandomAug'),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Resize',
        img_scale=img_scale,
        keep_ratio=True,
        bbox_clip_border=False),
    dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))),
    dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=img_scale,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[0.0, 0.0, 0.0],
                std=[1.0, 1.0, 1.0],
                to_rgb=False),
            dict(
                type='Pad',
                size_divisor=32,
                pad_val=dict(img=(114.0, 114.0, 114.0))),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='VideoCollect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=samples_per_gpu,
    workers_per_gpu=4,
    persistent_workers=True,
    train=dict(
        _delete_=True,
        type='MultiImageMixDataset',
        dataset=dict(
            type='CocoDataset',
            ann_file=[
                '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/train_cocoformat.json',
            ],
            img_prefix=[
                '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/training_data',
            ],
            classes=('car', 'ship', 'plane', 'train'),
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations', with_bbox=True)
            ],
            filter_empty_gt=False),
        pipeline=train_pipeline),
    val=dict(
        pipeline=test_pipeline,
        ann_file=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
        ],
        img_prefix=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
        ],
        classes=('car', 'ship', 'plane', 'train'),
        interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)),
    test=dict(
        pipeline=test_pipeline,
        ann_file=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
        ],
        img_prefix=[
            '/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
        ],
        classes=('car', 'ship', 'plane', 'train'),
        interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)))

# optimizer
# default 8 gpu
optimizer = dict(
    type='SGD',
    lr=0.001 / 8 * samples_per_gpu,
    momentum=0.9,
    weight_decay=5e-4,
    nesterov=True,
    paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0))
optimizer_config = dict(grad_clip=None)

# some hyper parameters
total_epochs = 80
num_last_epochs = 10
resume_from = None
interval = 5

# learning policy
lr_config = dict(
    policy='YOLOX',
    warmup='exp',
    by_epoch=False,
    warmup_by_epoch=True,
    warmup_ratio=1,
    warmup_iters=1,
    num_last_epochs=num_last_epochs,
    min_lr_ratio=0.05)

custom_hooks = [
    dict(
        type='YOLOXModeSwitchHook',
        num_last_epochs=num_last_epochs,
        priority=48),
    dict(
        type='SyncNormHook',
        num_last_epochs=num_last_epochs,
        interval=interval,
        priority=48),
    dict(
        type='ExpMomentumEMAHook',
        resume_from=resume_from,
        momentum=0.0001,
        priority=49)
]

checkpoint_config = dict(interval=1)
evaluation = dict(metric=['bbox', 'track'], interval=1)
search_metrics = ['MOTA', 'IDF1', 'FN', 'FP', 'IDs', 'MT', 'ML']

# you need to set mode='dynamic' if you are using pytorch<=1.5.0
fp16 = dict(loss_scale=dict(init_scale=512.))
AndrewGuo0930 commented 2 years ago

The training script I used train.sh

PORT=29504 ./tools/dist_train.sh /cluster/home/it_stu12/main/SatVideoDT/mmtracking/configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py 1 --no-validate
Seerkfang commented 2 years ago

You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.

If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.

AndrewGuo0930 commented 2 years ago

But I've already set num_classes to 4 in my config and still encounter the problem.

Seerkfang commented 2 years ago

You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.

AndrewGuo0930 commented 2 years ago

You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.

That means I could modify the code to track 4 classes rather than only 1 class? Could you please tell me which code should I modify, the configuration?

MarcoFrancescoMerola-rgb commented 2 years ago

You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.

If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.

Hi, i am using the _demo_mot_vis.py script to run inference with bytetrack_yolox_x_crowdhuman_mot17-private-half.py as config. My goal is MOT on multi-class, result is as wanted but looking into config file the base file bytetrack_yolox_x_crowdhuman_mot17-private-half.py_ at line 14 has bbox_head=dict(num_classes=1),. Now i am trying to understand, how can i define the number of classes and the specific classes to consider?

Zachein commented 1 year ago

Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?

AndrewGuo0930 commented 1 year ago

Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?

No. I haven't used MMTracking for a long time. Maybe multi-classes MOT is supported now? You can raise an issue for help.