Closed davidnvq closed 6 years ago
@quangdtsc Hi, Thanks.
A. Just find topK = topN - # of already matched anchor boxes, like[2 + 2, 3 + 1, 1 + 3].
B. Yes. Besides, we have a variable to mark the best match face for every anchor, e.g., every face has a list to store their candicant anchors, like face1_anchor_list = []
, face2_anchor_list = [box3, box5]
, face2_anchor_list = [box1, box2, box4]
. So we will iterate list_faces sequentially from face1 -> face2 -> face3 to find topK in their anchor_list.
C. Please refer to B for the answer.
Besides, some faces do not have enough candicant anchors, e.g., face1
in above example, so it will not match more anchors in step2.
@sfzhang15 Probably, your answer is quite clear to me. Thank you for your reply. For training time, could you share a little about the time for 1 full epoch with your computer configuration? I figure out the classification layer conv3_3 would slow the training down quite a lot. The training time on 1 epoch for me is 1 hour and 10 minutes (quite slow) on GTX 1080 Nvidia 8GB and CPU 8 cores. Maybe your answer would be a nice benchmark for me to confirm to myself that my implementation is correct in terms of time complexity.
@quangdtsc For 640x640 input image with 32 batch size, we use 2 Titan X (maswell) GPUs to train our model. Every iteration takes about 4.5s. So for 12W iterations, it needs about 6 days. As you said, the classification layer conv3_3 slows the training down quite a lot.
Thank you Shifeng for your great work and big congratulations on your new CVPR2018 and IJCAI work. I just looked around your website and I'm truly impressed what you did so far. I believe that you are and will be an awesome scientist who has a great impact to our society. Definitely!
Back to the SFD, it is now more than one year since your publication but I still find it useful and one of state-of-art on Face Detection. Though, I'm confused somehow with the matching strategy of SFD. I found your answer from previous issue:
As my understanding:
anchor_box
can be assigned to only one ground truth label (or 1 face).anchor_box
because of the fact #1. At step 1, eachanchor_box
is assigned to the best match (to the face with the highestiou
).N = 4
. There is a list of faces which has less than 4 anchor_boxeslist_faces = [ face1, face2, face3]
, and some[available_anchor_boxes = box1, box2, box3, box4, box5]
still available (weren't assigned to any faces in step 1). and the number of matched anchor boxes forface1, face2, face3
is[2, 3, 1]
. All of them <topN
(4).__|box1| box2 | box3 | box4 | box5| face1| 0.12 | 0.21 | 0.06 | 0.13 | 0.24 | face2| 0.24| 0.08 | 0.23 | 0.1 | 0.34 | face3| 0.33| 0.22 | 0.01 | 0.2 | 0.02 |
My question is: A. So we have to find topN more anchor boxes for
face1, face2, face3
-> so there new # of anchor boxes should be[2 + 4, 3 + 4, 1 + 4]
or just findtopK = topN - # of already matched anchor boxes, like
[2 + 2, 3 + 1, 1 + 3]`.B. In case of
[2 + 2, 3 + 1, 1 + 3]
, so we will iteratelist_faces
sequentially from face1 -> face2 -> face3 to findtopK
?C. If B is correct, for
face1
, we will find thetopK = 2
, say,[box5, box2]
. It means that we will markbox5
andbox2
as matched anchors (then they will not be assigned for any other face anymore). However, the best match forbox5
isface2
, andbox2
isface3
. After assigningtopK
anchor boxes forface1
, the available boxes now are[box1, box3, box4]
If this method is performed,face2
will be assigned withbox1
. The available boxes now is[box3, box4]
, and we can just assignbox4
forface3
even we need to assigntopK = 3
.I know that the probability for this overlapping
C
case happening is very low when we have the huge number anchor boxes, say here33125
, not5
boxes like the above example. However, I just want to make sure that your idea is correctly implemented like this or different?I hope the above example is simple and intuitive enough to point out my confustion and make your idea clearer to us. I would really appreciate if you can review my understanding 1, 2, 3 and my question A, B, C. Many thanks and have a nice day.