Open andrefaraujo opened 5 years ago
Hope these explanations will be helpful for you.
Thanks for all of these answers! I understand much better now, but am still confused about a few things:
2 Let me check if my understanding is correct: There are 801K items in total. However, IIUC what you say, some items are not associated to any identity. So, there are 43.8K*12.7=556K items with associated identities, and 801K-556K=244K items with no associated identity. Is this correct?
5 IIUC, this means that a false positive detection is not considered in the scoring. Is this correct?
6 I got a little more confused here, sorry :) From what you are saying, it sounds to me like in all cases the Match RCNN is trained with all heads, right? For the "pose" case, would this mean that the match network (MN) features come from the 14x14x512/28x28x32 maps instead of the 14x14x256 maps from the detector? Also, I am wondering what "pose+class" means in this case: are the feature maps from the detector and the "landmark" branch merged in the input to the MN?
Thanks again :)
the dataset statistics show has 390,884 images, but the release data just is 191961 images
Half of the training set has been released at present.
1、your benchmark is trained at released dataset ?2。i train detected model use released dataset with mask-rcnn,the mAP is 0.60, far from your benchmark,can you provide some advice?
thanks very much, used config yaml is mask_rcnn_r50_caffe_c4_1x.py, I just convert the released dataset to coco format ,trained at 8 GPU,total epochs is 12,img_scale is (1333,800)。your match_rcnn is realize by mmdetection?
Match-rcnn is realized in Detectron https://github.com/facebookresearch/Detectron with e2e_faster_rcnn_R-50-FPN_1x.yaml
Thanks @geyuying for all answers to my questions! I understand everything much better now :)
Match-rcnn is realized in Detectron https://github.com/facebookresearch/Detectron with e2e_faster_rcnn_R-50-FPN_1x.yaml
Could you please share your setting of the training? (input size, training epochs, learning rate) Thanks a lot.
@LouisLang1002 input_size (800,1333) initial learning rate 0.02 and 0.002 after epoch 16 ,0.0002 after epoch 22, and ends at epoch 24.
@LouisLang1002 input_size (800,1333) initial learning rate 0.02 and 0.002 after epoch 16 ,0.0002 after epoch 22, and ends at epoch 24.
Thanks for your guidance. Have you try a deeper backbone instead of R-50? And why you use faster r-cnn instead of mask r-cnn?Looking forward to your reply.
@LouisLang1002 Deeper backbone leads to OOM in my experiments. Actually, we use mask r-cnn, but e2e_faster_rcnn_R-50-FPN_1x.yaml in detction model, not faster r-cnn.
@LouisLang1002 Deeper backbone leads to OOM in my experiments. Actually, we use mask r-cnn, but e2e_faster_rcnn_R-50-FPN_1x.yaml in detction model, not faster r-cnn.
I'm confused, which model did you use to get mAP63.8 on release dataset?
ModelA Model B
@LouisLang1002 model A for detection and get map 63.8. Sorry that I misunderstand your question. We didn't use mask r-cnn because we want to evaluate detection performance only with bounding box, not mask.
@geyuying Hi, I see that different categories have different number of landmarks, how do you train the landmarks? Use the max num of landmarks of all the category for landmark output channel so that you can train all the category at the same time? or train each category separately with different models? Thanks!
@vinjohn Using the max num of landmarks of all the category(294 in total) for landmark output channel
- Your understanding is correct. In fact, for the pose case, the match network features come from the 14x14x256(after ROIAlign) and for the class case, 7x7x256(after ROIAlign). They achieve better retrieval results compared with other feature maps.
I'm confused about you explanation, how you combine these two features and fed into MN since the feature map sizes are different?
@geyuying Could you please tell me the STEPS_PER_EPOCH and VALIDATION_STEPS numbers that you have used for training? I mean during a single epoch you train all the dataset or the STEPS_PER_EPOCH is 3750 (90000/24)? Also if I understood correct, you train only the heads of Mask-RCNN? Thank you!!
Thanks for this great work, seems like a valuable contribution to the computer vision community!
I have a few detailed questions on the paper, which I hope you could clarify (apologies if I missed something in the paper):
Thanks in advance for the clarifications!