[Demo] Unable to output multiple bounding boxes per video frame

Description:

I followed the steps in README.md to reproduce the demo result. The script could run successfully, but only 1 bbox is shown per video frame and the output text file contains only 1 bbox prediction result per video frame. The track id is not changing as well.

python3 tools/demo_track.py video -f exps/example/mot/yolox_x_mix_det.py -c pretrained/bytetrack_x_mot17.pth.tar --fp16 --fuse --save_result

I tried to modify the demo_track.py and the byte_tracker.py scripts, and it seems like the predictor can output multiple bbox predictions with high scores, but the online target always returns 1 item only. After double-checking, the tracked_stracks variable of the ByteTracker seems to contain 0 item initially, and 1 item afterward.

## tools\demo_track.py
line 263:    online_targets = tracker.update(outputs[0], [img_info['height'], img_info['width']], exp.test_size)
line 264:    print("outputs", outputs[0][:2, :])  # outputs tensor([[..., 0.9980,   0.9639,   0.0000], [..., 0.9980,   0.9580,   0.0000]])
line 265:    print("targets", online_targets)     # targets [OT_1_(1-1)]

## yolox\tracker\byte_tracker.py
line 159:    def update(self, output_results, img_info, img_size):
line 160:        print("# tracked stracks", len(self.tracked_stracks))    # shows 0 for the 1st iteration, and 1 afterward

I have attempted to alter the match_thresh and the track_thresh arguments to different values (e.g. [0, 1], [1, 0], [0.5, 0.5]), but there is no help. How can I capture multiple bounding boxes in a single video frame just like that in the demo video? Thank you 🙇‍♂️

Screenshots:

Environment:

OS: Windows 11 Home 21H2 Python version: 3.9.10

Update:

I found out that the cause of this issue is that I tried to follow the discussion in #210 to resolve another issue AttributeError: 'STrack' object has no attribute '_count. I added super(STrack, self).__init__() inside the STrack init() method, and the consequence is what I mentioned in this issue (only 1 bbox per frame). This issue is also reproducible on the Colab demo notebook by adding the super(STrack, self).__init__() inside the STrack class init method of yolox/byte_tracker.py.

Solution:

I checked out the cloned repo at the previous commit 2c082be and revert all changes I have made to the byte_tracker.py and basetracker.py. Now my demo_track.py can successfully capture multiple bounding boxes.

ifzhang / ByteTrack