megvii-research / MOTRv2

[CVPR2023] MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
Other
359 stars 45 forks source link

custom data #71

Open zzzz737 opened 3 months ago

zzzz737 commented 3 months ago

Does the training of the custom data set not support the presence of a large number of frames in the video without detection results? For example, there are targets to track between frames 1 and 6, and then there are targets to track between frames 30 and 35.

图片

This section of the code seems to take the largest frame and the smallest frame of a video, from which the tracking target of each frame is taken. If there is no tracking target between the smallest frame and the largest frame, how to deal with it?

back2zack commented 2 months ago

Hi ,

i am not sure i understood your question. but the section of the code you highlighted is responsible for initializing the indices of each sequence. It does not load the data or is directly used during training. Instead, it initializes the indices of the frames, which are later used by the getitem function to load the corresponding frames based on their index and provide the upcoming frames based on sample length and sample_interval.

zzzz737 commented 2 months ago

Hi ,

i am not sure i understood your question. but the section of the code you highlighted is responsible for initializing the indices of each sequence. It does not load the data or is directly used during training. Instead, it initializes the indices of the frames, which are later used by the getitem function to load the corresponding frames based on their index and provide the upcoming frames based on sample length and sample_interval.

Thanks for your answer! According to what you said, this code only generates the index of the frame, but during the training process, it seems that the index of a frame is randomly generated and the corresponding detection result is obtained through the getitem function. My question is that if there are frames in the training data that do not have detection results, the training will report an error, what should I do with this kind of data?

back2zack commented 2 months ago

i would delete the frames that has no labels (no detections) if it breaks the training .

zzzz737 commented 2 months ago

i would delete the frames that has no labels (no detections) if it breaks the training .

Aha,my idea is the same as yours, I have a question, that is, after the middle frame is missed, the corresponding truth value, such as id0, is found in 1, 2 and 3 frames, but frame 2 is missed; After deleting frame 2, is Frame 1,3 in the corresponding truth value?

However, there may be only 2-3 frames for each tracking target in my data set, and there may be missing detection in the middle. After my training, the test effect is very poor, and I don't know what the problem is.