mikuhatsune / e2e_face

code for "Towards End-to-End Face Recognition through Alignment Learning"
9 stars 3 forks source link

No convergence #2

Open borisgribkov opened 5 years ago

borisgribkov commented 5 years ago

Dear @mikuhatsune First of all, thanks for the paper and uploaded code. Unfortunately I did get good results because of strange convergence behavior. Depending on the learning rate (0.03, 0.02, 0.01) and downscale coefficients for ST part ( from 1E-4 to 5E-6) I have seen the following: (1) no convergence at all, loss about 28, this is typical for lower parameter values (2) convergence stops after reaching the loss value about 12-13; (3) good convergence, loss about 6-8, but suddenly loss rises up to 28-30, accuracy falls to 0. This is for affine transforms, for projective transforms training stops after several iterations and only Ctrl-C helps. Network - ResNet18, Dataset - VGG2. At the same time, this version https://github.com/daerduoCarey/SpatialTransformerLayer for example works fine and I reproduced your paper results for affine transforms.

borisgribkov commented 5 years ago

UPDATE, larger batch size helps for affine transforms. The same for projective

WW2401 commented 5 years ago

Hi, can you send me a download link of VGGFace2? I can't download sucessfully from the official website. if I want to train the recognition net part of e2e_face as initial weights for ST training using VGGFace2, then train ST using VGGFace2 again. whether the size of output_H and output_W of ST layer should be the size of the images in VGGFace2 ? Thanks. @borisgribkov

borisgribkov commented 5 years ago

@WW2401 I used 224*224 input everywhere, for input for ST localization network the same as for recognition network after ST layer. As far as I remember authors decreased input dimensions for recognition network, check the paper. Regarding VGG2, did you use this link http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/ ? Sorry, at the moment I can't upload it somewhere because of its size, try to find here https://github.com/deepinsight/insightface/issues, but please note, you need not aligned data.

WW2401 commented 5 years ago

@WW2401 I used 224*224 input everywhere, for input for ST localization network the same as for recognition network after ST layer. As far as I remember authors decreased input dimensions for recognition network, check the paper. Regarding VGG2, did you use this link http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/ ? Sorry, at the moment I can't upload it somewhere because of its size, try to find here https://github.com/deepinsight/insightface/issues, but please note, you need not aligned data.

I have another question, when testing, just need to change the sizes of H W H1 W1(128) to 224 in test_lfw.m ?

borisgribkov commented 5 years ago

I didn't check the code you mention, but sounds correct

WW2401 commented 5 years ago

I didn't check the code you mention, but sounds correct

thanks a lot.

WW2401 commented 5 years ago

Hi, would you mind showing me your code for croping the images for VGGFace2? @borisgribkov

borisgribkov commented 5 years ago

just use 224 crop in Caffe data layer, read VGG2 paper about training details

WW2401 commented 5 years ago

just use 224 crop in Caffe data layer, read VGG2 paper about training details

The size of the images in VGGface2 released is not a fixed value (like 256 or something else), so I think there should be some preprocessings.

borisgribkov commented 5 years ago

just use 224 crop in Caffe data layer, read VGG2 paper about training details

The size of the images in VGGface2 released is not a fixed value (like 256 or something else), so I think there should be some preprocessings.

That's right, read the paper because I don't remember it in details, but it was something like resize shortest image dimension to 256.

WW2401 commented 5 years ago

Dear @mikuhatsune First of all, thanks for the paper and uploaded code. Unfortunately I did get good results because of strange convergence behavior. Depending on the learning rate (0.03, 0.02, 0.01) and downscale coefficients for ST part ( from 1E-4 to 5E-6) I have seen the following: (1) no convergence at all, loss about 28, this is typical for lower parameter values (2) convergence stops after reaching the loss value about 12-13; (3) good convergence, loss about 6-8, but suddenly loss rises up to 28-30, accuracy falls to 0. This is for affine transforms, for projective transforms training stops after several iterations and only Ctrl-C helps. Network - ResNet18, Dataset - VGG2. At the same time, this version https://github.com/daerduoCarey/SpatialTransformerLayer for example works fine and I reproduced your paper results for affine transforms.

Whether to_compute_dU should be set to true for training ST? I set the to_compute_dU of SpatialTransformerLayer to false when trainning.

mikuhatsune commented 4 years ago

i really appreciate your effort in reproducing this work!! caffe is notoriously hard to debug... i guess the main message of this paper is that Spatial Transformer can work in face recognition. other than that are just the usual deep learning param-tuning.