Open iamweiweishi opened 2 years ago
SELSA only supports two-stage detectors.
SELSA only supports two-stage detectors.
This is because the SELSA module is design to aggregate proto detection; it weights the proto detection of reference frames based on target frames proto detection, then aggregate them to obtain more rich and resilient feature. By weighting proto detection of proposal frames according to target proto detection, you basiccaly ensure that only the detection of the same instance object contribute to the aggregation.
Currently, the SELSA use the faster RCNN as the main structure for video detetion. Anchor free detectors, like YOLOX, are showing powerful performance. I wonder could I use YOLOX as the main network structure for VID