youngfly11 / LCMCG-PyTorch

AAAI2020-The official implementation of "Learning Cross-modal Context Graph for Visual Grounding"
57 stars 12 forks source link

some questions about the proposal #10

Open JCZ404 opened 2 years ago

JCZ404 commented 2 years ago

Hi, Thank you for your great work, your code is very excellent! But may I ask you for some questions about the algorithm? 1,How did you generate the fixed num of the pre-computed box? In your paper, you said we first need to generate M =100 pre-computed box as the region proposals,but when I use Faster RCNN to generate the pre-computed box, it seems can't guarantee to get the fixed num of pre-computed box because of the NMS process. In the test mode Faster RCNN, there are two NMS processes, one for proposals generated by RPN, another for the boxes generated by ROI Head. In the first situation, after NMS, there are still much proposal, but in the second situation, after NMS for each class label, only some number of boxes survive. However, in bottom-up-attention, we use the bbox generated by ROI Head. Actually, when I use Faster RCNN weight file you provided and roi_head_nms=0.3 as you said, I can only get much less pre-computed boxes, like 56, 18..., sometimes even less than topN=10, which would cause the error when run your code. For this situation, I guess it because the well-trained Faster RCNN can already accurate the location of the object, which result in most of boxes are generated at the same position and removed by the ROI_Head_NMS. So, currently, I set the nms=0.7, and I just feel a little bit confused about this. 2,Which dataset you pre-train the Faster RCNN for generating the pre-computed box? In your code, it seems that you used the Faster RCNN pre-trained on COCO dataset to generate the pre-computed box, but it only covers 81 categories. In the Flickr30K Entities, there are lots of categories doesn't exist in COCO, so does this situation matter? Because I find that most of works use the Faster RCNN pre-trained on Visual Genome dataset which covers 1600 classes and some attribute labels.

youngfly11 commented 2 years ago

Thanks for your interest.

  1. we extract at most 100 proposals for each image. In fact, we set at least 10 and at most 100 by only setting the nms score and set scores.
  2. Most of results are performed by using the BUTD detectors. Our rn50 faster rcnn on coco experiments is only for fair comparison with seqground.

Yongfei Liu

ShnaghaiTech University

在 2021年12月7日,11:34,Zhangjiacheng144 @.***> 写道:



Hi, Thank you for your great work, your code is very excellent! But may I ask you for some questions about the algorithm? 1,How did you generate the fixed num of the pre-computed box? In your paper, you said we first need to generate M =100 pre-computed box as the region proposals,but when I use Faster RCNN to generate the pre-computed box, it seems can't guarantee to get the fixed num of pre-computed box because of the NMS process. In the test mode Faster RCNN, there are two NMS processes, one for proposals generated by RPN, another for the boxes generated by ROI Head. In the first situation, after NMS, there are still much proposal, but in the second situation, after NMS for each class label, only some number of boxes survive. However, in bottom-up-attention, we use the bbox generated by ROI Head. Actually, when I use Faster RCNN weight file you provided and roi_head_nms=0.3 as you said, I can only get much less pre-computed boxes, like 56, 18..., sometimes even less than topN=10, which would cause the error when run your code. For this situation, I guess it because the well-trained Faster RCNN can already accurate the location of the object, which result in most of boxes are generated at the same position and removed by the ROI_Head_NMS. So, currently, I set the nms=0.7, and I just feel a little bit confused about this. 2,Which dataset you pre-train the Faster RCNN for generating the pre-computed box? In your code, it seems that you used the Faster RCNN pre-trained on COCO dataset to generate the pre-computed box, but it only covers 81 categories. In the Flickr30K Entities, there are lots of categories doesn't exist in COCO, so does this situation matter? Because I find that most of works use the Faster RCNN pre-trained on Visual Genome dataset which covers 1600 classes and some attribute labels.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/youngfly11/LCMCG-PyTorch/issues/10, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF3OD5HJIDDPOGGE5NPROJTUPV6CLANCNFSM5JQGZ6EA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

JCZ404 commented 2 years ago

Thanks for you reply, I got it! Now, I also set the ROI_Head_NMS=0.7 to get at least 10 precomputed box. But when I run your code, it seems the loss is very small, I guess it because the precomputed box is actually accurate, so the number of positive sample for box classification and the offset for regression is small. I'm not sure about this. image

And what's more, it seems there are some problems with the relation label, because for each relation, there are topN*topN connections, this connection are assigned with -1(ignore),0(negtive),or 1(positive), but you did a normalization with all of this, and finally to calucate the soft label classification loss, which result in the negtive loss. But also, I'm not very sure about this. image image

youngfly11 commented 2 years ago

I am sorry that some implementation details I have forgotten. But I strongly recommend that you can read our code line by line by ipdb. Every detail in the code.

发件人: Zhangjiacheng144 @.> 发送时间: 2021年12月7日 12:02 收件人: youngfly11/LCMCG-PyTorch @.> 抄送: Yongfei Liu @.>; Comment @.> 主题: Re: [youngfly11/LCMCG-PyTorch] some questions about the proposal (Issue #10)

Thanks for you reply, I got it! Now, I also set the ROI_Head_NMS=0.7 to get at least 10 precomputed box. But when I run your code, it seems the loss is very small, I guess it because the precomputed box is actually accurate, so the number of positive sample for box classification and the offset for regression is small. I'm not sure about this. https://user-images.githubusercontent.com/51013927/144963249-c30e0fce-a750-4e17-a0c4-0b70398ae7f6.png

And what's more, it seems there are some problems with the relation label, because for each relation, there are topN*topN connections, this connection are assigned with -1(ignore),0(negtive),or 1(positive), but you did a normalization with all of this, and finally to calucate the soft label classification loss, which result in the negtive loss. But also, I'm not very sure about this. https://user-images.githubusercontent.com/51013927/144963817-e9131e8f-ad74-49f9-ab2f-9348e620f1f1.png https://user-images.githubusercontent.com/51013927/144963911-bef0608c-6fb6-410f-83d4-b0be769d5115.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/youngfly11/LCMCG-PyTorch/issues/10#issuecomment-987547067 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3OD5DBXH2BDXNRLXD4RADUPWBLBANCNFSM5JQGZ6EA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

JCZ404 commented 2 years ago

Hi, I'm very sorry for bothering you again. I have already checked the generated proposal, I find that when calculating the classification loss and regression loss, for each phrase, there are only a few proposals has IoU>0.5 with the gt box, like 1,2, sometimes even no proposal have the IoU>0.5, I want to ask is that normal? Or the generated proposals are wrong, as you said, I set the ROI_Head nms=0.3, score_thresh=0.01 to generate at least 10 proposal for each image, could you provide your nms and score threshold? Thanks a lot!

youngfly11 commented 2 years ago

I recommend you use https://github.com/MILVLG/bottom-up-attention.pytorch to extract features. nms=0.3 and score_thresh=0.1 is ok