Describe the feature
The multi-scale deformable attention provided in mmcv is the batch version, could mmcv support the stack version?
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when processing the two stage multi-camera detection task, similar application scenarios could be found in nuscenes dataset, that's mean the number of proposals provided in the first stage is not equal.
Ex2. There is a recent similar implementation cases could be found in openpcdet, which provides stack_version of point-net, could multi-scale deformable attention have the similiar implementation?
Related resources
the pointnet stack version in openpcdet
the ROI align in mmcv
Describe the feature The multi-scale deformable attention provided in mmcv is the batch version, could mmcv support the stack version? Motivation A clear and concise description of the motivation of the feature. Ex1. It is inconvenient when processing the two stage multi-camera detection task, similar application scenarios could be found in nuscenes dataset, that's mean the number of proposals provided in the first stage is not equal. Ex2. There is a recent similar implementation cases could be found in openpcdet, which provides stack_version of point-net, could multi-scale deformable attention have the similiar implementation?
Related resources the pointnet stack version in openpcdet the ROI align in mmcv
Additional context