How can I perform mixed training with DET and VID datasets, considering that the DET dataset does not consist of consecutive frames?
In your code, you mentioned performing only a single-stage mixed training. How can the DET dataset be incorporated into this mixed training approach?
How can I perform mixed training with DET and VID datasets, considering that the DET dataset does not consist of consecutive frames? In your code, you mentioned performing only a single-stage mixed training. How can the DET dataset be incorporated into this mixed training approach?