timmeinhardt / trackformer

Implementation of "TrackFormer: Multi-Object Tracking with Transformers”. [Conference on Computer Vision and Pattern Recognition (CVPR), 2022]
https://arxiv.org/abs/2101.02702
Apache License 2.0
521 stars 117 forks source link

About MOT dataset PREV #129

Open Tsai-chia-hsiang opened 3 months ago

Tsai-chia-hsiang commented 3 months ago

At https://github.com/timmeinhardt/trackformer/blob/e468bf156b029869f6de1be358bc11cd1f517f3c/src/trackformer/datasets/mot.py#L56-L59

Here, I would like to ask why the prev_frame_id needs may take the subsequent frame ID. For example:

frame_id = 90 and self._prev_frame_range = 5 and self.seq_length = 100

and the

prev_frame_id =  random.randint( max(0, 90-5), min(90+5, 100) ) 

It will become random.randint(85, 95) and can generate prev_frame_id = 91 and it is actually the next frame.

In the next few lines https://github.com/timmeinhardt/trackformer/blob/e468bf156b029869f6de1be358bc11cd1f517f3c/src/trackformer/datasets/mot.py#L61-L64

It seems that it didn't deal with the relationship between preceding and subsequent frames, and keys in `target ' also don't provide the frame number information.

Why the prev_image and prev_target may be the information about the later frame while it is named PREVIOUS?

Thanks.

Maxvgrad commented 3 months ago

I also found this piece of code confusing. There's another edge case that might result in the same frame_id as the current one.

I'm curious if it's an issue to have a future or identical frame as the previous frame.

I would also be interested in the rationale behind this code.

Cheers!

userkw2 commented 3 months ago

hi @Maxvgrad I want to use TrackFormer for multi-object tracking with my own dataset. According to the instructions, I need to create a COCO-style annotation file and extend it with fields like seq_length, first_frame_image_id, and track_id for multi-object tracking. I'm unsure how to prepare my dataset with these extensions and how to use the generate_coco_from_mot.py script. Could you guide me on how to structure my dataset and generate the necessary COCO-style annotations for TrackFormer?

Maxvgrad commented 3 months ago

@Tsai-chia-hsiang btw it seems like this logic is mentioned in the paper

The frame t − 1 for step (i) is sampled from a range of frames around frame t, thereby generating challenging frame pairs where the objects have moved substantially from their previous position. Such a sampling allows for the simulation of camera motion and low frame rates from usually benevolent sequences.

Tsai-chia-hsiang commented 3 months ago

Oh, ok, I see. Thanks!