kpzhang93 / MTCNN_face_detection_alignment

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks
MIT License
2.8k stars 1.01k forks source link

Understanding cascading of sizes in mtcnn #21

Closed sidgan closed 6 years ago

sidgan commented 6 years ago

Hi, Im trying to follow through the code and understand how mtcnn works. I understand that for each image, for each scale the detection comes from each of the networks. In particular I am talking about the Pnet right now.

The image is rescaled according to the scales produced earlier and the rescaled image goes into the Pnet as follows in the code:

%Code file: detect_face.m
if fastresize
            im_data=imResample(im_data,[hs ws],'bilinear');
        else 
            im_data=(imResample(img,[hs ws],'bilinear')-127.5)*0.0078125;
        end
        PNet.blobs('data').reshape([hs ws 3 1]);
        out=PNet.forward({im_data});

For reference I have printed out the original size and the rescaled size: ORIGINAL Height: 340 ORIGINAL Width: 151 SCALE USED (were computed before): 0.107493555074 RESCALED Height: 37 RESCALED Width: 17

The net corresponds to Pnet and in det1.prototxt (PNet) the input size should have h=12 and w=12.

% Code file: det1.prototxt 
input_dim: 1
input_dim: 3
input_dim: 12
input_dim: 12

What I don't understand is where is the size going from size of image to 12x12?

zgzhong commented 6 years ago

@sidgan Hi! I have the same question with you! Did you figure out this problem? Thanks.

xcls1117 commented 6 years ago

Pnet is a full conv network, it doesn't matter what size the input is. Its output rely on its input size. just not smaller than 12x12. @sidgan @Lisupy