mks0601 / A-Convolutional-Neural-Network-Cascade-for-Face-Detection

TensorFlow implementation of "A Convolutional Neural Network Cascade for Face Detection", CVPR 2015
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Li_A_Convolutional_Neural_2015_CVPR_paper.pdf
109 stars 62 forks source link

Training Issues #5

Closed 96imranahmed closed 6 years ago

96imranahmed commented 7 years ago

Hi, so I've spent a while trying to train these nets for a project but I'm just running into a lot of issues (and the results are garbage).

For the calib-nets, how many images do you train on? I find that if I use over 4500 training images, there is a RAM (yes RAM) overflow on my computer and I can't train with any more images - my PC has 16GB RAM. I'm using the HollywoodHeads dataset which is very similar to the positive training dataset used in this paper (but has slightly more perspectives etc.).

Also, for the 12-Net Cali, what is your average accuracy? I can only seem to get around 0.6 - even after implementing ADAM and tinkering with learning rates. Is this good or bad - I don't know what to compare it against? For the 24-Net and 48-Net, I get an accuracy of 1.0 and a very low loss rate - this also seems wrong but I can't seem to get anything different.

I had a look at the dataset and can verify that I'm extracting all the boundary boxes correctly - any advice on what I'm doing wrong would be much appreciated!


For the remaining nets, I choose 50000 training images and 100 negative images (with which hard mining gives about 0.5M different images). Does this sound like roughly the same approach that you guys are taking?

Apologies for the barrage of questions - it's just that this has stumped me and I can't seem to get any better results from things. Thanks!

@mks0601 @tengshaofeng

mks0601 commented 7 years ago

Hi, 1) I used AFLW dataset for positive images and there are about 20k images. My computer has 32GB ram, so it was enough to load all the images into my computer. If you are suffer from shortage of RAM, then loading each images from disk would be a solution.

2) I cannot remember exactly, however the accuracy of 12 calib net was better than 0.6.

3) I cannot understand the question. Extracting all the boundary boxes? when?

4) For the hard neg mining, paper said they used about 5k images and I also used that amount of images.

I`m rearranging the code because this code is the very first code when I study Deep Learning for the first time. It would be better if you reference rearranged code.

Best, Gyeongsik Moon

tengshaofeng commented 7 years ago

Hi,

  1. if you RAM is not enough for the train data, you can split it several splits, and read each split from disk each time.
    1. I have test the average of 12-net-cali with all the train data, it is about 0.62, and 24-net-cali 0.78, and 48-cali 0.85
    2. you can adjust the Threshold of 12-net, 24-net, and 48-net to get better result. also I have modify the NMS function, overlapping area dividing min(a, b), which a,b is the box area.
96imranahmed commented 7 years ago

Thanks guys - this is definitely helping me move in the right direction! @tengshaofeng - I'll try to see if I can recreate those losses!

I'll try to have a go at reading in images from the disk as opposed to loading them to memory

With regards to neg-mining, the paper mentions training on 200,000 "patches" and 5800 "images". From what I gather, you use the 5800 negatives for the 12-net, and the hard-mine a selection of these images to generate the 200,000 patches for the 24 and 48-net. Is the method that you guys used?

mks0601 commented 7 years ago

No you understood in an opposite direction. 200,000 random patches are needed to train 12-net 5800 images are needed to train 24-net and 48-net(hard neg mining)

Slide 12-net through multiple scale for each 5,800 negative images and collect patches whose confidence is larger than threshold of your 12-net. The collected patches from above is negative sample of 24-net

The negative samples for 48-net can be obtained in a same way.(run 12-net and 24-net for each 5,800 negative images)

96imranahmed commented 7 years ago

Ah sorry, I must've misunderstood the paper. I'm not even at that stage as yet so it's all good!

Just an update w.r.t the calib net - I've been trying to train through loading data from disk (so now I'm training on a 25k size dataset just like AFLW), but now I'm getting a 12-net accuracy of about 0.3 (with loss of 2.3 after 50 epochs). It's looking like things settle around that level and I'm not exactly sure what I'm doing wrong - my preprocessing is definitely giving me the same sort of input to the net.

I'll keep you all posted, but I may implement the 12-calib in Keras to check if I'm doing things properly. I will let you know if I get any different results... Thanks for all your help!

yuye1992 commented 6 years ago

Hi, when I run 'python hard_neg_mining.py 24' ,it occurs ResourceExhaustedError : OOM when allocating tensor with shape[#,#,#,#]. Can you tell me how to solve it ? Thank you.

yuye1992 commented 6 years ago

I have solved it!