tfaehse / DashcamCleaner

Censor identifiable information in videos, in particular dashcam recordings in Germany.
GNU Affero General Public License v3.0
131 stars 27 forks source link

data donation #5

Open breunigs opened 2 years ago

breunigs commented 2 years ago

I have ~29h of video (5x speed up, so technically ~145h), including understand.ai's labels. It is definitely a different camera than yours. I can donate that for retraining.

Since their model has some unfortunate misses or inaccurate bounding boxes, I also went ahead and started manually labeling. That dataset is currently only ~10k images with ~18k head¹ and ~9k plate bounding boxes that are "large enough" to avoid training the model on bboxes with only a few pixels. I can also donate the model with these bboxes at 1280px, which proved sufficient for my use case, as the small bboxes are not discernible anyway. The model is definitely not perfect, here's an example inference that was not used for training/validation: https://vimeo.com/660127406 . The model is trained from yolov5x6 and 1.1GB, since "deployability to someone else's computer" was not a concern, though.

¹: I use a "head + neckline" bounding box, rather than just "face" as anonymizer does. I.e. the two are incompatible.

tfaehse commented 2 years ago

Hey @breunigs, that looks amazing! I'd be very, very happy if you could send me some of your data - especially since you have a lot of faces/heads in there. They're incredibly underrepresented in my dataset, and the (comparatively) few faces I have aren't always perfectly labeled either. I don't drive all that much, and when I do it's pretty much just highways. I might take a page out of your book and start taking a camera when I ride my bike though!

I don't fully know how to best share data with me privately - aside from github, you can reach me via email at my github username (at) me.com. Or any other means you prefer really. Thank you so much!

breunigs commented 2 years ago

I invited you to a private github repo, they should have sent you an email.

tfaehse commented 2 years ago

They did. Thank you once more!! It will take (quite) a while, but I'll post the updated trained models here of course, and some slight changes to consider the neck/face distinction you're making, it's a great idea.

I'll start the download with your command and a rate limit of 10MiB/s, do tell me (here or wherever) in case I'm straining the server too much.

breunigs commented 2 years ago

You don't need to constrain yourself that much, 100 MBit/s should be no problem.

breunigs commented 2 years ago

oh wait, nevermind, I misread that you wrote MiB/s, so yeah, that sounds reasonable :)