open-mmlab / mmtracking

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
https://mmtracking.readthedocs.io/en/latest/
Apache License 2.0
3.56k stars 598 forks source link

How to track human in video? #144

Closed ioctl-user closed 3 years ago

ioctl-user commented 3 years ago

I am trying run VID demo with the following command:

# python demo/demo_vid.py configs/vid/dff/dff_faster_rcnn_r101_dc5_1x_imagenetvid.py --checkpoint https://download.openmmlab.com/mmtracking/vid/dff/dff_faster_rcnn_r101_dc5_1x_imagenetvid/dff_faster_rcnn_r101_dc5_1x_imagenetvid_20201218_172720-ad732e17.pth  --input demo/demo.mp4 --device cpu --output output.mp4
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
2021-04-24 19:32:39,179 - mmtrack - INFO - load motion from: https://download.openmmlab.com/mmtracking/pretrained_weights/flownet_simple.pth
2021-04-24 19:32:39,179 - mmtrack - INFO - Use load_from_http loader
Use load_from_http loader
/opt/conda/lib/python3.7/site-packages/mmdet/models/dense_heads/rpn_head.py:192: UserWarning: In rpn_proposal or test_cfg, nms_thr has been moved to a dict named nms as iou_threshold, max_num has been renamed as max_per_img, name of original arguments and the way to specify iou_threshold of NMS will be deprecated.
  'In rpn_proposal or test_cfg, '
/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py:3000: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "

In output video car and bike are in the bbox, but not people. How to track only humans in video?

GT9505 commented 3 years ago

All video detectors are trained in ImageNet VID dataset. They can only detect objects of 30 categories, and the person category not belongs to these categories. If you only want to track humans, you can try MOT methods, such as Deep Sort and Tracktor

ioctl-user commented 3 years ago

Thanks, will try.