Closed lxtGH closed 3 years ago
I think that the improvement is mainly from the additional information provided by regressing mask embeddings:
Except for the global image features, I also adjust the crop operation for data augmentation to ensure the training with 4 GPUs. Comparing to SparseRCNN/DETR, the crop type is changed to "relative" with size "(0.7, 0.7)", see details in here. I am not sure if this adjustment can improve the detection results. Please let me know if you obtain further results.
Thanks for your reply.
@hujiecpp Hi! Thanks for opensourcing your code. It is a very amazing work. I do not understand Why the box ap is higher than Sparse R-CNN? From the code, I find you use the global image features for query feature learning. However, I add this into sparse rcnn (mmdet version) I can not find the improvement.