Open AndrewGuo0930 opened 2 years ago
Here's my config bytetrack_yolox_s_512_alltrain_sat.py
.
_base_ = [
'../../_base_/datasets/mot_challenge.py', '../../_base_/default_runtime.py'
]
img_scale = (512, 512)
samples_per_gpu = 4
model = dict(
type='ByteTrack',
detector=dict(
type='YOLOX',
input_size=img_scale,
random_size_range=(18, 32),
random_size_interval=10,
backbone=dict(
type='CSPDarknet', deepen_factor=0.33, widen_factor=0.5),
neck=dict(
type='YOLOXPAFPN',
in_channels=[128, 256, 512],
out_channels=128,
num_csp_blocks=1),
bbox_head=dict(
type='YOLOXHead',
num_classes=4,
in_channels=128,
feat_channels=128),
train_cfg=dict(
assigner=dict(type='SimOTAAssigner', center_radius=2.5)),
test_cfg=dict(
score_thr=0.01, nms=dict(type='nms', iou_threshold=0.7)),
init_cfg=dict(
type='Pretrained',
checkpoint= # noqa: E251
'/cluster/home/it_stu12/main/SatVideoDT/mmtracking/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth' # noqa: E501
)),
motion=dict(type='KalmanFilter'),
tracker=dict(
type='ByteTracker',
obj_score_thrs=dict(high=0.6, low=0.1),
init_track_thr=0.7,
weight_iou_with_det_scores=True,
match_iou_thrs=dict(high=0.1, low=0.5, tentative=0.3),
num_frames_retain=30))
train_pipeline = [
dict(
type='Mosaic',
img_scale=img_scale,
pad_val=114.0,
bbox_clip_border=False),
dict(
type='RandomAffine',
scaling_ratio_range=(0.1, 2),
border=(-img_scale[0] // 2, -img_scale[1] // 2),
bbox_clip_border=False),
dict(
type='MixUp',
img_scale=img_scale,
ratio_range=(0.8, 1.6),
pad_val=114.0,
bbox_clip_border=False),
dict(type='YOLOXHSVRandomAug'),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Resize',
img_scale=img_scale,
keep_ratio=True,
bbox_clip_border=False),
dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))),
dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=img_scale,
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[0.0, 0.0, 0.0],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(
type='Pad',
size_divisor=32,
pad_val=dict(img=(114.0, 114.0, 114.0))),
dict(type='ImageToTensor', keys=['img']),
dict(type='VideoCollect', keys=['img'])
])
]
data = dict(
samples_per_gpu=samples_per_gpu,
workers_per_gpu=4,
persistent_workers=True,
train=dict(
_delete_=True,
type='MultiImageMixDataset',
dataset=dict(
type='CocoDataset',
ann_file=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/train_cocoformat.json',
],
img_prefix=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/training_data',
],
classes=('car', 'ship', 'plane', 'train'),
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True)
],
filter_empty_gt=False),
pipeline=train_pipeline),
val=dict(
pipeline=test_pipeline,
ann_file=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
],
img_prefix=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
],
classes=('car', 'ship', 'plane', 'train'),
interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)),
test=dict(
pipeline=test_pipeline,
ann_file=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/annotations2/val_cocoformat.json',
],
img_prefix=[
'/cluster/home/it_stu12/main/SatVideoDT/datasets/VISO/validation_data',
],
classes=('car', 'ship', 'plane', 'train'),
interpolate_tracks_cfg=dict(min_num_frames=5, max_num_frames=20)))
# optimizer
# default 8 gpu
optimizer = dict(
type='SGD',
lr=0.001 / 8 * samples_per_gpu,
momentum=0.9,
weight_decay=5e-4,
nesterov=True,
paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0))
optimizer_config = dict(grad_clip=None)
# some hyper parameters
total_epochs = 80
num_last_epochs = 10
resume_from = None
interval = 5
# learning policy
lr_config = dict(
policy='YOLOX',
warmup='exp',
by_epoch=False,
warmup_by_epoch=True,
warmup_ratio=1,
warmup_iters=1,
num_last_epochs=num_last_epochs,
min_lr_ratio=0.05)
custom_hooks = [
dict(
type='YOLOXModeSwitchHook',
num_last_epochs=num_last_epochs,
priority=48),
dict(
type='SyncNormHook',
num_last_epochs=num_last_epochs,
interval=interval,
priority=48),
dict(
type='ExpMomentumEMAHook',
resume_from=resume_from,
momentum=0.0001,
priority=49)
]
checkpoint_config = dict(interval=1)
evaluation = dict(metric=['bbox', 'track'], interval=1)
search_metrics = ['MOTA', 'IDF1', 'FN', 'FP', 'IDs', 'MT', 'ML']
# you need to set mode='dynamic' if you are using pytorch<=1.5.0
fp16 = dict(loss_scale=dict(init_scale=512.))
The training script I used train.sh
PORT=29504 ./tools/dist_train.sh /cluster/home/it_stu12/main/SatVideoDT/mmtracking/configs/mot/bytetrack/bytetrack_yolox_s_512_alltrain_sat.py 1 --no-validate
You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.
If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.
But I've already set num_classes to 4 in my config and still encounter the problem.
You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.
You are running the test code, which means the state_dict is loaded from the pretrained checkpoints and would not update (if you didn't change the code). In this way, even if you change the num_classes in bbox_head, the pretrained one-class detector would probably behave badly for those untrained classes.
That means I could modify the code to track 4 classes rather than only 1 class? Could you please tell me which code should I modify, the configuration?
You can see the config file for bytetrack, the detector is trained with MOT17 and crowdhuman, and the num_classes in bbox_head is set to 1, which means it's only used for pedestrain detection.
If you've go through the whole inference procedure, the size of results should be the same as dataloader length, every forward result (even empty) appends to the final results.
Hi, i am using the _demo_mot_vis.py
script to run inference with bytetrack_yolox_x_crowdhuman_mot17-private-half.py
as config. My goal is MOT on multi-class, result is as wanted but looking into config file the base file bytetrack_yolox_x_crowdhuman_mot17-private-half.py
_ at line 14 has bbox_head=dict(num_classes=1),
.
Now i am trying to understand,
how can i define the number of classes and the specific classes to consider?
Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?
Hello. I met the same problem when I trained in custom datasets. Have you solved this problem?
No. I haven't used MMTracking for a long time. Maybe multi-classes MOT is supported now? You can raise an issue for help.
Describe the bug
epoch_80.pth
.AssertionError: Dataset and results have different sizes: 3724 v.s. 2
car
plane
ship
train
. Could the error be because the testing script only support single class MOT?Reproduction
epoch_80.pth
.Environment
Error traceback