ozendelait / rvc_devkit

Robust Vision Challenge Devkits
http://www.robustvision.net/
MIT License
106 stars 13 forks source link

Decide target format for joint IS data #3

Closed ozendelait closed 4 years ago

ozendelait commented 4 years ago

Choices: -json polygon -maskfile (e.g. Cityscapes instanceIds 16bit png) -use same as in Panoptic (COCO panoptic format = 24bit png + json for maskid<->classid)

ozendelait commented 4 years ago

I have not used Inst. Segm Frameworks; what is the most common denominator? Using polygons instead of masks may be a more complex mapping problem for datasets that have mask-based GT; also I don't think this helps as most datasets I have seen only supply one value per pixel (i.e. no information about occluded extents of labels) -> there is no direct benefit for using polygons cocoapi can transform between mask and RLE formats: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py

rodrigob commented 4 years ago

COCO (non-panoptic) and OID use both binary encodings. For panoptic one would need a non-binary encoding.

For panoptic format "encoded color image + json" seems reasonable to me: it handles raw masks, can be easily inspected, and it scales up to many instances and many classes. For the mask encoding possibly 16 bits per pixel are enough (hard limit on no more than 65k objects per image seems safe to me in 2020). The question is choosing the encoding.

Some outcomes of our previous explorations of the topic:

To cut the cake between tuned PNG, webp lossless, or tif; some smalls scale experiments would be needed.

ozendelait commented 4 years ago

We could support both png and RLE via the existing cocoapi right? @png encoding: how does the png tuning work? IMHO tiff is a mess, it can basically contain any encoding you want (i.e. very incompatible with many image libraries). webp is a good idea but I'm unsure how many trainingframeworks can read it? Also: what is the expected filesize we talk about? I thought that 90-95% of the consumed size comes from the intensity/RGB input images and only a small portion from the annotations? If so, then png may be the best choice, even 24bit pano from COCO.

rodrigob commented 4 years ago

As a reference of "default png" vs webp I just ran: cwebp -lossless -m 6 -q 100 (highest compression lossless)

over ~300 png annotations from coco2017 panoptic val set.

The png files sum up to 2.2Mb. The same webp files sum up to 801Kb. That is a factor 2.7x in size. If you think it over 100Gbs of data...

The compression does take a few seconds per image, I have not played with the speed vs lossless compression trade-off.

rodrigob commented 4 years ago

webp: I'm unsure how many trainingframeworks can read it?

I am not familiar with all kind of frameworks. It is supported by the "standard" imageio https://imageio.readthedocs.io/en/stable/format_webp-fi.html

and by Tensorflow https://www.tensorflow.org/io/api_docs/python/tfio/image/decode_webp

what is the expected filesize we talk about? I thought that 90-95% of the consumed size comes from the intensity/RGB input images and only a small portion from the annotations? If so, then png may be the best choice, even 24bit pano from COCO.

This is a good point.

Reading the top part of http://cocodataset.org/#download It indicates coco2017 train+val images = ~19Gb; while panoptic annotations are 1Gb. So indeed probably not worth obsessing over annotation file sizes; and vanilla PNG is fine.

(we focused on this for OID, because it was a bottleneck for submission files)

Bottom line: I think that same as COCO panoptic is a good choice.

ozendelait commented 4 years ago

Great, I think this can reduce the complexity of the dev kit as well :)

ozendelait commented 4 years ago

Without additional inputs we decide on using COCO Panoptic format (24bit png + json) for inst. segm (with the option of supporting binary RLE form pycocotools; Please add comments if you disagree!

ozendelait commented 4 years ago

Ok, looks like we agree on COCO png; issue closed.