Closed Haroenv closed 5 years ago
I am finding very similar issues as well:
Agreed. I have a plan to help fix this. One thing to note is that faces near microphones might exacerbate this bias. The original training data was provided by https://github.com/alexkimxyz/nsfw_data_scrapper
The input images seem to slightly off due to the subreddits that fueled the dataset. Hopefully, I can counterbalance this and contribute back to that original repo.
I tried going the opposite approach and seeing how it classified women who were predominantly covered up, I used two different types of women, two with women in traditional Muslim Garb, and one with traditional Quaker garb. I then tried out an image that (by my analysis) would be considered more "sexy", and the results were woefully incorrect.
As you can see, the predominantly covered group, are all flagged as porn with high certainty, despite being anything but. Though interestingly, they are also all flagged with a high "Neutral" rating as well.
VS this woman in a revealing, but technically not "porn quality" swimsuit.
This would be a good lesson for the ML Model, as you could use this result to test false positives, where if the "Porn" & "Sexy" categories are the two highest, it is most likely porn. Obviously, this would still allow for some false positives labeled as "porn" to fall through the cracks, but it's a step in the right direction. Similarly, if the two highest categories are "Porn" and "Neutral", then there is a good chance that it is also a false positive as well.
Great examples so far! This perspective is very valuable. I filed an issue on the data-repo.
I've gotten lots of requests to share the dataset (for a bunch of reasons, not all for science). But with some public curation of the training data, the model can significantly improve. Adding a few hundred photos of false positives into the test/train will help.
As a call to action, I would love to crowsource-fix this problem. Feel free to let me know you have a zip of false-positive images to add, or if you'd be interested in curation.
... The original training data was provided by https://github.com/alexkimxyz/nsfw_data_scrapper
@GantMan it'd be nice if you credited the original repo https://github.com/alexkimxyz/nsfw_data_scraper in both your medium blog post and this repo. Thanks.
You read my mind @alexkimxyz - you've been added to both! Sorry for the delay
Bearded man covered in snow:
My fingers
Hah, is there a subreddit of hands? If so we can add it.
Some more examples here: https://twitter.com/JustTenDads/status/1098502194196697088 (Jeffrey goldblum)
RE: Jeff Goldblum
🤣
I'm working on a newly trained model. This takes DAYS on my home computer. This will take time
I appreciate everyone's feedback! You'll be happy to know I'm grabbing some updated training photos to hopefully help fix this bias... except for Jeff Goldblum, that stays. I'm going to lock this conversation for two reasons, which I hope are fair and clear.
So if you find category bias, please contribute back to the data scraper, it's easy! Those contributions will make it to the model, which will make it to NSFW JS.
I'm happy to announce after a lot of tweaking, adjusting, and hours of additional training. I've increased the test set to reflect a broader array of images, AND increased accuracy to 93% on that larger dataset.
I'm very happy to continue working on improving NSFW JS
Thank you all for your feedback, and I hope you find the following results pleasing.
If someone can suggest a large set of images of people singing into microphones, I'd love to add that to the training data. Please supply zip files with hundreds of examples in appropriate folders, if you'd like to help!
I'd like to thank everyone who helped in the spirit of making something creative and useful. Please keep in mind this is not my full-time job. I build things like this as a passion to help the community.
This model is about to be released. If you have old results, please clear your cache and make sure you get the latest model.
There seems to be a bias of when an image contains a woman with any skin showing, the model will mark those as porn, even when they clearly aren't.
There's some examples in this twitter post, but I could reproduce it with different images too.