nightrome / cocostuff10k

The official homepage of the (outdated) COCO-Stuff 10K dataset.
https://arxiv.org/abs/1612.03716
277 stars 55 forks source link

COCO json annotation format? #2

Closed ahundt closed 7 years ago

ahundt commented 7 years ago

Is cocostuff available in the coco json annotation format?

nightrome commented 7 years ago

Unfortunately not yet. In fact we are currently working on annotating all images in COCO and the final dataset will be provided in JSON format. Could you tell me what use-case you have in mind?

ahundt commented 7 years ago

I have python code that can load the COCO dataset and train a segmentation classifier, so with the same json format I could train on these new annotations simply by changing which annotation file I load. Thanks for the consideration!

nightrome commented 7 years ago

Okay, I'll look into it. Only problem is that we don't have a notion of instances, as for the COCO things (1 car = 1 instance). So I guess I'll go with individual superpixels or connected components for now.

ahundt commented 7 years ago

Cool! That will actually make this better than the original COCO for my use cases. I'm not using instances at the moment so I didn't consider that difference. Perhaps there is a clean way to handle that that would keep most existing code using upstream coco working correctly?

Some possibilities:

  1. Mark everything as a single instance and clearly explain this discrepancy up front in the docs/comments for the dataset
  2. Provide a patched fork of pycocotools that handles non-instance segmentations
    • Perhaps the upstream coco would also accept such a pull request if no breaking changes are introduced.
  3. Actually add instance annotations
    • This is super complicated and time intensive so I understand this is likely impractical.

Thanks again for considering this feedback!

nightrome commented 7 years ago

Now I have created the JSON annotations. The solution I took was to create one annotation (in JSON) per label that is present in an image (not superpixels or connected components). To avoid overlap with the COCO thing labels I mapped the stuff labels to the range 92-182. For more information see https://github.com/nightrome/cocostuff#json-format.

I would appreciate if you could test the JSON format with your code and see if it behaves the same way as COCO.

ahundt commented 7 years ago

Fantastic! With these, should I be able to use the images from the original COCO dataset or should I download your version linked on the readme separately?

nightrome commented 7 years ago

Hi, the images are just copied and therefore the original is at least as good. Note that our JSON is just the stuff and no thing annotations, so you'll still need:http://msvocds.blob.core.windows.net/coco2014/train2014.zip