hujiecpp / ISTR

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)
202 stars 28 forks source link

Why the box ap is higher than Sparse R-CNN? #2

Closed lxtGH closed 3 years ago

lxtGH commented 3 years ago

@hujiecpp Hi! Thanks for opensourcing your code. It is a very amazing work. I do not understand Why the box ap is higher than Sparse R-CNN? From the code, I find you use the global image features for query feature learning. However, I add this into sparse rcnn (mmdet version) I can not find the improvement.

hujiecpp commented 3 years ago

I think that the improvement is mainly from the additional information provided by regressing mask embeddings:

  1. Bipartite matching with additional mask embeddings provides higher quality matching results for training.
  2. The bounding box detection could benefit from multi-task learning.

Except for the global image features, I also adjust the crop operation for data augmentation to ensure the training with 4 GPUs. Comparing to SparseRCNN/DETR, the crop type is changed to "relative" with size "(0.7, 0.7)", see details in here. I am not sure if this adjustment can improve the detection results. Please let me know if you obtain further results.

lxtGH commented 3 years ago

Thanks for your reply.