pywikibot-catfiles / file-metadata

A python package to analyze files and provide useful metadata
MIT License
15 stars 1 forks source link

License number plate detection for privacy reasons #50

Open jayvdb opened 8 years ago

jayvdb commented 8 years ago

Number plates should usually be obscured, so a bot categorising number plates will allows editors to photoshop the number plate, or a bot could also do it. See also https://github.com/nicolas-raoul/apps-android-commons/issues/190 trying to prevent the problem at (one of) the source.

Faces should also in some circumstances, but IMO a bot can not know when.

jayvdb commented 8 years ago

Top level cat : https://commons.wikimedia.org/wiki/Category:License_plates

Two cats for 'fixed' media https://commons.wikimedia.org/wiki/Category:Images_with_blanked_out_license_plates https://commons.wikimedia.org/wiki/Category:Images_with_blurred_out_license_plates

'Blurred' also contains revised plates e.g. https://www.flickr.com/photos/salford_ian/2183153995/ vs https://commons.wikimedia.org/wiki/File:Beardmore_taxicab.jpg

AbdealiLoKo commented 8 years ago

Reference : https://github.com/openalpr/openalpr

drtrigon commented 8 years ago

+1 for plate det. ang geo tagging +1 for obscuring

Although IMO obscuring should be done by a different bot script to keep things managable. Also because it will be slower since needs to down- AND upload images.

jayvdb commented 8 years ago

Ya, this bot shouldnt automatically obscure the file. It should categorise, and a human should check the problem before submitting it to an obscuring process.

drtrigon commented 8 years ago

Storing and letting allow other processes like a bot to access additional data like position of the plate etc. could be useful here.

jayvdb commented 8 years ago

A basic database will be useful. But as noted in a previous meeting, a database is coming to Wikimedia Commons (https://phabricator.wikimedia.org/T68108), with the first prototype out (https://phabricator.wikimedia.org/T125822), and it is worth waiting for that, otherwise we build a database on labs that wont be adopted and we'll need to rewrite the database code to store the metadata on Commons.

More info at https://commons.wikimedia.org/wiki/Commons:Structured_data