openphilanthropy / unrestricted-adversarial-examples

Contest Proposal and infrastructure for the Unrestricted Adversarial Examples Challenge
Apache License 2.0
327 stars 62 forks source link

Acquiring more labelled training images #81

Open davidwagner opened 3 years ago

davidwagner commented 3 years ago

Currently the 0.0.4 dataset provides 125 training images of each class. If we want to train on more images, are there any resources to make it easier to acquire more labelled images that are valid and unambiguous, or do we need to re-implement the tasker evaluation ourselves?

If we use the IDs in bird-or-bicycle/bird_or_bicycle/metadata/0.0.4/, it looks like we can get close to 1000 more images of birds that have been verified by taskers, but no more images of bicycles are available for training from there. Anything else I am missing?

carlini commented 3 years ago

I don't think we've collected more high quality labeled examples in train. The extra dataset has something like 27k more images that we've found helpful for training a classifier. I've been able to train a single linear layer on top of ImageNet features using the extra dataset to get ~99% test accuracy. But as you say, they're not filtered correctly.

davidwagner commented 3 years ago

Thank you. Seems like getting more images of bicycles might take the most work. In my random sample of bicycles from extras/, 1/20 (5%) looked to me like they meet the requirements; I took another random sample, and 4/34 (12%) looked to me like they met the requirements; though I see from tasker_labels_0.0.4.csv that about 289/1322 (22%) met the requirements. I'm not sure why there was such variability among those three estimates (perhaps you all did some filtering before feeding images to taskers? or perhaps I just got unlucky in my random samples?). So if we filter extras, I'm guessing we might be able to obtain ~ 10000 good training images of birds and between 800-3000 good training images of bicycles, but this will require us to do the filtering ourselves. Thanks for the information.