Open borisgribkov opened 5 years ago
UPDATE, larger batch size helps for affine transforms. The same for projective
Hi, can you send me a download link of VGGFace2? I can't download sucessfully from the official website. if I want to train the recognition net part of e2e_face as initial weights for ST training using VGGFace2, then train ST using VGGFace2 again. whether the size of output_H and output_W of ST layer should be the size of the images in VGGFace2 ? Thanks. @borisgribkov
@WW2401 I used 224*224 input everywhere, for input for ST localization network the same as for recognition network after ST layer. As far as I remember authors decreased input dimensions for recognition network, check the paper. Regarding VGG2, did you use this link http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/ ? Sorry, at the moment I can't upload it somewhere because of its size, try to find here https://github.com/deepinsight/insightface/issues, but please note, you need not aligned data.
@WW2401 I used 224*224 input everywhere, for input for ST localization network the same as for recognition network after ST layer. As far as I remember authors decreased input dimensions for recognition network, check the paper. Regarding VGG2, did you use this link http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/ ? Sorry, at the moment I can't upload it somewhere because of its size, try to find here https://github.com/deepinsight/insightface/issues, but please note, you need not aligned data.
I have another question, when testing, just need to change the sizes of H W H1 W1(128) to 224 in test_lfw.m ?
I didn't check the code you mention, but sounds correct
I didn't check the code you mention, but sounds correct
thanks a lot.
Hi, would you mind showing me your code for croping the images for VGGFace2? @borisgribkov
just use 224 crop in Caffe data layer, read VGG2 paper about training details
just use 224 crop in Caffe data layer, read VGG2 paper about training details
The size of the images in VGGface2 released is not a fixed value (like 256 or something else), so I think there should be some preprocessings.
just use 224 crop in Caffe data layer, read VGG2 paper about training details
The size of the images in VGGface2 released is not a fixed value (like 256 or something else), so I think there should be some preprocessings.
That's right, read the paper because I don't remember it in details, but it was something like resize shortest image dimension to 256.
Dear @mikuhatsune First of all, thanks for the paper and uploaded code. Unfortunately I did get good results because of strange convergence behavior. Depending on the learning rate (0.03, 0.02, 0.01) and downscale coefficients for ST part ( from 1E-4 to 5E-6) I have seen the following: (1) no convergence at all, loss about 28, this is typical for lower parameter values (2) convergence stops after reaching the loss value about 12-13; (3) good convergence, loss about 6-8, but suddenly loss rises up to 28-30, accuracy falls to 0. This is for affine transforms, for projective transforms training stops after several iterations and only Ctrl-C helps. Network - ResNet18, Dataset - VGG2. At the same time, this version https://github.com/daerduoCarey/SpatialTransformerLayer for example works fine and I reproduced your paper results for affine transforms.
Whether to_compute_dU should be set to true for training ST? I set the to_compute_dU of SpatialTransformerLayer to false when trainning.
i really appreciate your effort in reproducing this work!! caffe is notoriously hard to debug... i guess the main message of this paper is that Spatial Transformer can work in face recognition. other than that are just the usual deep learning param-tuning.
Dear @mikuhatsune First of all, thanks for the paper and uploaded code. Unfortunately I did get good results because of strange convergence behavior. Depending on the learning rate (0.03, 0.02, 0.01) and downscale coefficients for ST part ( from 1E-4 to 5E-6) I have seen the following: (1) no convergence at all, loss about 28, this is typical for lower parameter values (2) convergence stops after reaching the loss value about 12-13; (3) good convergence, loss about 6-8, but suddenly loss rises up to 28-30, accuracy falls to 0. This is for affine transforms, for projective transforms training stops after several iterations and only Ctrl-C helps. Network - ResNet18, Dataset - VGG2. At the same time, this version https://github.com/daerduoCarey/SpatialTransformerLayer for example works fine and I reproduced your paper results for affine transforms.