Group photos don't perform as well as expected

wingman-jr-addon commented 4 years ago

@abdullahezzat1 wrote in on a separate issue and I wanted to break it out to its own issue to continue the conversation: "I don't know a lot about machine learning, but I think the model may be missing some training for group photos. False positives are pretty low but the lower they get the better of course."

wingman-jr-addon commented 4 years ago

@abdullahezzat1 Can you tell me a bit more about the behavior you're seeing here? For example, is it failing to catch objectionable content within certain types of group photos?

abdullahezzat1 commented 4 years ago

Exactly, when you have many persons with objectionable clothes for example it probably will fail to catch the image.

wingman-jr-addon commented 4 years ago

One thing that might be happening to a certain degree is the relative scale. Internally, the network's native size is the fairly standard 224x224. The larger the group, generally the smaller the individual. You can see this type effect with images that are proportionately tall or wide as well - the resizing deforms the image too much for detection to work as well. I do train with significant numbers of positives and negatives of groups though; for example most swim teams are good examples of hard negatives and certain beach parties are examples of positives. Unfortunately the best way I can probably improve this is if I have more specifics about where it's failing but I know that can be difficult. For a while I had a big false positive problem with pictures of babies (#22) so I took extra effort to work through that. As a side note, you may also be interested in #40 .

wingman-jr-addon commented 3 years ago

@abdullahezzat1 I'm doing a bit of cleanup on my open issues. I doubt this is fully resolved because the images are only so big, but I'm also guessing it has had some improvement in releases since this was opened. Is it still enough of an issue for you that I should keep the issue open?

wingman-jr-addon commented 3 years ago

@abdullahezzat1 I've been giving this some thought and was trying to think about the best way to handle it. I think there are ways you could tune the model fro this but one simple idea is to do something like a center crop on the image as a second pass. I have a branch of this concept if you want to check it out. The downside is that without something smarter in place, it definitely penalizes pictures with only a single person as it artificially emphasizes their middle - so more false positives are flagged. I've been trying to think of ways to improve the heuristic and I think I may be able to make something a little better by choosing the section to middle cropped based on the aspect ratio of the original image. If you have thoughts on it let me know. Or if this isn't an issue due to model improvements let me know too and I can close it out.

abdullahezzat1 commented 3 years ago

It's better to preserve the quality of prediction of single pictures, and I don't think crop solution is going to be helpful because you can't really predict how the picture is taken. I should say that I'm not familiar with the concepts of machine learning, so this is just a non-technical opinion.

wingman-jr-addon commented 3 years ago

Ah, well, the choice of cropping has more to do with the nature of photos than with machine learning. So no worries on that front. It's true you can't predict exactly how the photo is going to be taken, but there are trends. Consider: most group photos are landscape rather than portrait. And to select an individual of the photo as a representative, you can slice out some part of the middle. That's all my thinking was based on. I think there are much more clever things you could do, but it seemed like an easy thing to try and it definitely varied things up a bit. On the actual machine learning side of things, I do think I remember reading one paper where they got better results by using something like 5 different crops within the image for NSFW detection, so they must have had some thinking down a similar route (they were not trying to catch group photos though). I do think a more advanced technique could be to do object detection to find humans, then additionally bound roughly each human as a single detection. At this point in time that would unfortunately not be worth the tradeoff I don't think.

At any rate, I'll keep thinking but probably keep it as-is for now. I'm excited for 3.1 to come out now but Mozilla is having delays in their approval process so it's only available here on GitHub at the moment.

abdullahezzat1 commented 3 years ago

I wouldn't bother with the crop idea. Object detection is the way to go, but I'm guessing it would require reconsidering the whole model from scratch, and I'm guessing it would also be slower. If you're not going to do object detection, at least at the moment, then just close it for now.

wingman-jr-addon commented 3 years ago

Yes it would be slower at this point, and it would require rethinking of the model to incorporate object detection - as well as the dataset. It's also not actually clear to me that I want to go too heavily down an object detection route even though it seems ideal because sometimes the objectionable aspects of a picture depend on nearly global context within the image e.g. perhaps it is in how the persons are interacting rather than their individual poses etc. And again - probably slower.

If you ever have ideas for future improvements, I also have a separate repository for the model itself now - it'd be a great place to add more issues or ideas around the model.

As per your note, I'll close this for now.

wingman-jr-addon / wingman_jr

Group photos don't perform as well as expected #61