levan92 / deep_sort_realtime

A really more real-time adaptation of deep sort
MIT License
167 stars 51 forks source link
computer-vision deep-sort-tracking deepsort multi-object-tracking pytorch tracking

Deep SORT

Introduction

A more realtime adaptation of Deep SORT.

Adapted from the official repo of Simple Online and Realtime Tracking with a Deep Association Metric (Deep SORT)

See their paper for more technical information.

Dependencies

requirements.txt gives the default packages required (it installs torch/torchvision to use the default mobilenet embedder), modify accordingly.

Main dependencies are:

Install

cd deep_sort_realtime && pip3 install .

Run

Example usage:

from deep_sort_realtime.deepsort_tracker import DeepSort
tracker = DeepSort(max_age=5)
bbs = object_detector.detect(frame) 
tracks = tracker.update_tracks(bbs, frame=frame) # bbs expected to be a list of detections, each in tuples of ( [left,top,w,h], confidence, detection_class )
for track in tracks:
    if not track.is_confirmed():
        continue
    track_id = track.track_id
    ltrb = track.to_ltrb()
from deep_sort_realtime.deepsort_tracker import DeepSort
tracker = DeepSort(max_age=5)
bbs = object_detector.detect(frame) # your own object detection
object_chips = chipper(frame, bbs) # your own logic to crop frame based on bbox values
embeds = embedder(object_chips) # your own embedder to take in the cropped object chips, and output feature vectors
tracks = tracker.update_tracks(bbs, embeds=embeds) # bbs expected to be a list of detections, each in tuples of ( [left,top,w,h], confidence, detection_class ), also, no need to give frame as your chips has already been embedded
for track in tracks:
    if not track.is_confirmed():
        continue
    track_id = track.track_id
    ltrb = track.to_ltrb()

Getting bounding box of original detection

The original Track.to_* methods for retrieving bounding box values returns only the Kalman predicted values. However, in some applications, it is better to return the bb values of the original detections the track was associated to at the current round.

Here we added an orig argument to all the Track.to_* methods. If orig is flagged as True and this track is associated to a detection this update round, then the bounding box values returned by the method will be that associated to the original detection. Otherwise, it will still return the Kalman predicted values.

orig_strict argument in all the Track.to_* methods is only active when orig is True. Flagging orig_strict=True will mean it will output None when there's no original detection associated to this track at current frame, otherwise normally it will return Kalman predicted values.

Storing supplementary info of original detection

Supplementary info can be pass into the track from the detection. Detection class now has an others argument to store this and pass it to the associate track during update. Can be retrieved through Track.get_det_supplementary method. Can be passed in through others argument of DeepSort.update_tracks, expects to be a list with same length as raw_detections. Examples of when you will this includes passing in corresponding instance segmentation masks, to be consumed when iterating through the tracks output.

Polygon support

Other than horizontal bounding boxes, detections can now be given as polygons. We do not track polygon points per se, but merely convert the polygon to its bounding rectangle for tracking. That said, if embedding is enabled, the embedder works on the crop around the bounding rectangle, with area not covered by the polygon masked away.

When instantiating a DeepSort object (as in deepsort_tracker.py), polygon argument should be flagged to True. See DeepSort.update_tracks docstring for details on the polygon format. In polygon mode, the original polygon coordinates are passed to the associated track through the supplementary info.

Differences from original repo

Highlevel overview of source files in deep_sort (from original repo)

In package deep_sort is the main tracking code:

Test

python3 -m unittest

Appearance Embedding Network

Pytorch Embedder (default)

Default embedder is a pytorch MobilenetV2 (trained on Imagenet).

For convenience (I know it's not exactly best practice) & since the weights file is quite small, it is pushed in this github repo and will be installed to your Python environment when you install deep_sort_realtime.

TorchReID

Torchreid is a person re-identification library, and is supported here especially useful for extracting features of humans. Torchreid will need to be installed (see dependencies section above) It provides a zoo of models. Select model type to use, note the model name and provide as arguments. Download the corresponding model weights file on the model zoo site and point to the downloaded file. Model 'osnet_ain_x1_0' with domain generalized training on (MS+D+C) is provide by default, together with the corresponding weights. If embedder='torchreid' when initalizing DeepSort object without specifying embedder_model_name or embedder_wts, it will default to that.

from deep_sort_realtime.deepsort_tracker import DeepSort
tracker = DeepSort(max_age=5, embedder='torchreid')
bbs = object_detector.detect(frame) 
tracks = tracker.update_tracks(bbs, frame=frame) # bbs expected to be a list of detections, each in tuples of ( [left,top,w,h], confidence, detection_class )
for track in tracks:
    if not track.is_confirmed():
        continue
    track_id = track.track_id
    ltrb = track.to_ltrb()

CLIP

CLIP is added as another option of embedder due to its proven flexibility and generalisability. Download the CLIP model weights you want at deep_sort_realtime/embedder/weights/download_clip_wts.sh and store the weights at that directory as well, or you can provide your own CLIP weights through embedder_wts argument of the DeepSort object.

Tensorflow Embedder

Available now at deep_sort_realtime/embedder/embedder_tf.py, as alternative to (the default) pytorch embedder. Tested on Tensorflow 2.3.1. You need to make your own code change to use it.

The tf MobilenetV2 weights (pretrained on imagenet) are not available in this github repo (unlike the torch one). Download mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5 from https://github.com/JonathanCMitchell/mobilenet_v2_keras/releases/tag/v1.1. You may drop it into deep_sort_realtime/embedder/weights/ before pip installing.

Background Masking

If instance mask is given during DeepSort.update_tracks with no external appearance embeddings given, the mask will be used to mask out the background of the corresponding detection crop so that only foreground information goes into the embedder. This reduces background bias.

Example

Example cosine distances between images in ./test/ ("diff": rock vs smallapple, "close": smallapple vs smallapple slightly augmented)

.Testing pytorch embedder
close: 0.012196660041809082 vs diff: 0.4409685730934143

.Testing Torchreid embedder
Model: osnet_ain_x1_0
- params: 2,193,616
- flops: 978,878,352
Successfully loaded pretrained weights from "/Users/levan/Workspace/deep_sort_realtime/deep_sort_realtime/embedder/weights/osnet_ain_ms_d_c_wtsonly.pth"
close: 0.012312591075897217 vs diff: 0.4590487480163574