philipperemy / yolo-9000

YOLO9000: Better, Faster, Stronger - Real-Time Object Detection. 9000 classes!
Apache License 2.0
1.18k stars 309 forks source link

Face/head detection? #3

Closed abagshaw closed 6 years ago

abagshaw commented 7 years ago

I was wondering, do you know if the pre-trained models for yolo9000 support face and/or head detection? For an object detection framework that can detect 9000+ items, I would think that heads and faces would be two of them considering that's fairly basic - but in my testing I don't seem to be able to get yolo9000 to detect faces (it just detects the whole person).

The 9k.names file here: https://github.com/pjreddie/darknet/blob/master/data/9k.names does have labels for both face and head - but I've never had any of those objects detected. Any ideas?

philipperemy commented 7 years ago

From my experience, yolo9000 does not recognise face and/or head.

I made an experiment at my office. You can see the result of yolo 9000 on many frames: https://www.youtube.com/watch?v=2TyKm5cPYm4

I have no ideas on this point as I'm not familiar how they trained the model. But I think if you want to do that, you will have to re-train all the model (or doing transfer learning to some extent). It's probably detailed in the paper. I'm also curious about the dataset that they use for those 9000 categories.

abagshaw commented 7 years ago

@philipperemy Hmm, that's a shame. Thanks for the video. It does make me wonder though - that 2 out of ~9000 labels that clearly are not being detected....how many more labels are not in use?

philipperemy commented 7 years ago

I think we have to tweak the algorithm a little bit with the thresholds. Probably the probability of recognising a person is much higher than a probability of recognising a face. Also I don't know how they deal with overlapping (A face is part of a person). Did you try to run yolo-9000 on a picture with just a face on it (can't see the body)?

abagshaw commented 7 years ago

No I've haven't tried running it on just picture of a face - I'll give that a shot though. I'll also play around with the thresholds to see what I can get. Thanks!

abagshaw commented 7 years ago

@philipperemy I've been playing around with https://github.com/thtrieu/darkflow/pull/257 and even thought I've set the hierarchythreshold value super low (0.001) I'm still not getting face or head detentions showing up on images that have faces and heads clearly visible. Actually it's possible there's a bug in that implementation of YOLO9000 (as there doesn't seem to be any effect on the predictions when I change the hierarchythreshold value). I wish there was better documentation on YOLO9000 somewhere (about the intricacies of how this hierarchy detection works...how one could train a model for it, etc). I've read through the research paper a number of times but it still isn't clear to me. Oh well!

philipperemy commented 7 years ago

@abagshaw here are the best explanations I've found so far about YOLO9000 (related to darknet and not darkflow):

Quoted from the first one:

the default is to threshold the class predictions at .5 (i.e. traverse the tree until the probability of that class would go below .5). but you can make it lower or higher using the -hier flag like -hier .2 or -hier .7 you should play around with this a little depending on your use case, etc. using -hier 0 gives the most confident leaf node. also good to play around with the -thresh setting to detect objects that it's less confident about, default is like .25 i think. it's definitely not a perfect network but it does pretty well on a lot of things.

You should try to make it work directly with the original darknet framework. Here they mentioned about two thresholds: thresh and hier.

./darknet detector test cfg/combine9k.data cfg/yolo9000.cfg ~/data/networks/yolo/yolo9000.weights data/dog.jpg -thresh .12 -hier .6

relh commented 7 years ago

Basically, the first threshold is what is used to determine whether or not there is an object in the predicted bounding box. The network predicts an explicit 'objectness' score separate from the class predictions that if above the threshold indicates that a bounding box will be returned.

Once a box is predicted, the network traverses the tree of candidate detections and multiples through the conditional probabilities for each item, e.g. object animal feline * house cat. The hierarchical threshold is used in this second step, completely after and separate from whether there is an item or not, to decide whether following the tree further to a more specific class is the right action to take. When this threshold is 0, the tree will basically follow the highest probability branch all the way to a leaf node.

So if you want more detections the trick is to lower the first threshold value. If the detections are too specific, one should lower the second value.

Further there is a third step, the nms, that works by looking at all bounding boxes that made it past the 'objectness' threshold and removes the least confident ​of the boxes that overlap with each other above a certain IOU threshold​ (3rd potentially tuneable parameter).

philipperemy commented 7 years ago

@relh thanks a lot for this useful insight!

abagshaw commented 7 years ago

@relh @philipperemy Thanks for all your help. I think this makes more sense now. I'm still running into some problems though. I've been playing around with changing the -hier threshold for darknet and I can't seem to see any changes no mater what value I put. -hier 0.0001, -hier 0 and -hier 0.9999 all produce the same results for me on all the images I've tested (and I'd expect those values to produce drastically different results based on what you've written. Seems odd.

On a different note (and back to the original point of getting face/head detections to show up) I would think that this would have more to do with the -thresh value because as @relh mentioned - that's the value that's going to allow an object to be detected or not, the -hier value will only have an effect on what label is assigned that object (if I'm understanding this correctly). I suppose the -hier value could have an indirect effect on the detection of faces/heads as assigning the same label as the main object could result in the NMS function removing the face/head...but I'm not sure about that.

relh commented 7 years ago

If you check out pjreddie's reply to https://groups.google.com/forum/#!topic/darknet/Cw99SpxKK2A, you'll see that he describes the -hier flag as working the same way as I describe above.

However, in practice, although the -hier value gets passed through to the appropriate function calls: demo/detector/test/get_region_boxes/and hierachy_top_prediction it has no effect.

I tested various values for the -hier flag and found that it was similarly ineffective. My hunch is that he might be not softmax-ing the layer output as is described in the paper. This would make sense because the reported confidence values are straight from the 'objectness' score of the detector and the only place the conditional probability tree would be used is for this type of prediction (if this was the case it would mean the 200 class similarly chooses a max value). To test this I ran with increasing levels of -hier

I was hoping to find a threshold where the class predictions would change but this was as far as I got before it runs into a segmentation fault:

./darknet detector test cfg/combine9k.data cfg/yolo9000.cfg cfg/yolo9000.weights -hier 9.42425 ~/Code/darkflow/sample_img/sample_person.jpg (anything about 9.42425ish leads to a segfault)

I'm going to try to jump into gdb or add some prints here to figure out what's going on.

philipperemy commented 7 years ago

@abagshaw any updates so far?

abagshaw commented 7 years ago

@philipperemy Unfortunately I still haven't been able to get face/head predictions showing up. Not only that but in general I've found that the YOLO9000 model misses some predictions that the YOLOv2 COCO model picked up just fine and the bounding boxes for the predictions it does generate are in general less accurate as well. Unless I'm missing something it seems to me that either the pretrained model for YOLO9000 is just not a great model - or maybe the net design is in general less accurate as it tries to accommodate all those new classes? For the time being I'm sticking to using YOLOv2 until some new developments come along.

philipperemy commented 7 years ago

@abagshaw no problem! YOLO9000 is less accurate because of the number of classes it has to handle. Those links discuss the accuracy (mAP) of the models:

philipperemy commented 6 years ago

Just an update here.

I advise you to use this repository to detect the faces: https://github.com/ageitgey/face_recognition

I'm going to close this issue. Feel free to reopen it if you have any questions.

Thank you all.

debhazra commented 5 years ago

@philipperemy : the github link you have given for face recognition is using OpenCV and face_recognition modules. But too slow to detect & recognize. Therefore will not help much to recognize faces when people are in motion. For such use case probably YOLO face detector will be the perfect choice along with OpenCV. Therefore, would like to see if there is any python code where YOLO and OpenCV has been used together for much faster face detection & recognition. Help from anyone will be well appreciated.