openphilanthropy / unrestricted-adversarial-examples

Contest Proposal and infrastructure for the Unrestricted Adversarial Examples Challenge
Apache License 2.0
327 stars 62 forks source link

question about the dataset #59

Closed AngusG closed 5 years ago

AngusG commented 5 years ago

I believe there are a few erroneous images in the training set for bird (which is significant when there are only 500 images per class).

Q 1 in Appendix ".1 Instructions given to taskers says:

Does this photo contain a bird, or a depiction of a bird (e.g., a toy bird, a painting of a bird, a stuffed animal bird, a cartoon bird) anywhere in the image?

However, point 4 and the italicized text below suggest that paintings/depictions are not allowed:

(It is okay if the object is a photorealistic rendering of a bird/bicycle.) .. , is not truncated, is not occluded, and is not a depiction of any sort.

In any case, I do not believe this black and white sketch from the bird training set should be included:

2e8cd9f60546ac55

There's also:

954c63e11aa1232e

And this image (also from the bird training set) is of a moth: 6b3797dec5bea3c7

AngusG commented 5 years ago

Also, there are several truncated bird images where either only the bird's head is visible, or the head is missing. Instruction 4 says:

  1. This bird/bicycle is complete and not truncated. It does not go outside of the image at all

But it's unlikely taskers would make it this far since I would have answered "Definitely yes" to Q1 for all of these samples. I don't see how representations learned on bird heads can be expected to generalize to birds at an arbitrary distance from the camera, esp when there is no requirement that the bird be facing the camera.

e77595104d2f62a4 6a8cf5ae66f3f4b2 3b5b81359c0f337c

This bird has no head: 3f9b4b2ad5991d96

nottombrown commented 5 years ago

Thanks for bringing this to our attention @AngusG. I agree that those datapoints are erroneous and will investigate how they got into the training set.

nottombrown commented 5 years ago

Resolved by https://github.com/google/unrestricted-adversarial-examples/pull/60

Thanks again for noticing this @AngusG

AngusG commented 5 years ago

Thanks for dealing with this so quickly, glad to know I interpreted the data spec correctly. I noticed this ended up affecting even more images, which were then moved to the extras split. Do you have a spec or intended use case for the extras in mind? For instance, I can see some kind of unsupervised pre-training based on the moth image still being useful for detecting birds since it's a natural scene where the object of focus has a similar shape and context as a bird image.

nottombrown commented 5 years ago

Exactly. The idea behind extras is that it would be useful for training classifiers even though it is less pristine than the IID train and test data.

The SVHN dataset follows this convention as well. On Thu, Nov 8, 2018 at 6:32 PM Angus Galloway notifications@github.com wrote:

Thanks for dealing with this so quickly, glad to know I interpreted the data spec correctly. I noticed this ended up affecting even more images which were moved to the extras split. Do you have a spec or intended use case for the extras in mind? For instance, I can see some kind of unsupervised pre-training based on the moth image still being useful for detecting birds since it's a natural scene where the object of focus has a similar shape and context as a bird image.

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/google/unrestricted-adversarial-examples/issues/59#issuecomment-437229170, or mute the thread https://github.com/notifications/unsubscribe-auth/AASt3-_Lz4Cdhk1ZfyTL-YcOgHKICZ4tks5utOlPgaJpZM4YQ7Ij .