OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
This seems abnormal. You need 12 days to train this model according to the log, but generally two to three days is enough. It seems that your data reading time is abnormal. You can check it.