thodan / bop_toolkit

A Python toolkit of the BOP benchmark for 6D object pose estimation.
http://bop.felk.cvut.cz
MIT License
376 stars 135 forks source link

Dataset tools #87

Closed ylabbe closed 1 year ago

ylabbe commented 1 year ago

The two main features of BOP-imagewise are:

BOP-webdataset is composed of a few shards (.tar files), each containing 1000 images with annotations. It is simply created by randomly shuffling images from BOP-imagewise and splitting them across .tar files. Since images are stored in .tar files, fast sequential readings of the shards can be achieved. It is also convenient to share as a user can download only a subset of the datasets by selecting a subset of the shards.

NOTES:

Download links: YCBV train_pbr imwise format YCBV train_pbr webdataset format

megapose-shapenet (web format) megapose-gso (web format)

thodan commented 1 year ago

@ylabbe Shall we refer to the two new formats as BOP-v2 and BOP-v2-webdataset? If I understand it correctly, one could pack either of v1 and v2 into the webdataset format, so it would be good to make it explicit which version is packed.

thodan commented 1 year ago

@ylabbe "The object and camera annotations are stored in individual files (one file per image instead of one per chunk). This allows faster reading of individual images and annotations."

I don't think it's in general true that this yields faster reading (reading of many small files tends to be slower on shared network storages than reading one large file). Instead of "faster", let's say "more convenient".

thodan commented 1 year ago

From the benchmarking results, it seems that webdataset is "only" ~2X faster in sequential access while much slower in random access (compared to v2). If someone wants to do random access, it means they need to convert webdataset to v2.

I'm wondering whether it is worth releasing both v2 and webdataset (and thus to increase complexity of maintaining & increasing the required space on our servers) since webdataset doesn't seem to bring significant benefits?

I think we could just release v2. If interested, people could then convert v2 to webdataset on their own using the script Yann wrote. Do you agree? @ylabbe, @MartinSmeyer

MartinSmeyer commented 1 year ago

Hmm, imo the web format might be useful for the large, new training datasets where you can't load annotations/masks into memory at the start of training. But I don't see much benefit in bop-v2 over bop-v1? Yann's benchmark is on loading all files, but the masks can just be loaded quickly from the coco gt data files (per scene) that we provide in RLE format in bop-v1. We could easily extend them to amodal masks without providing a new format.

v2 also sounds like v1 is obsolete. I would at least be a bit more careful with the naming.

Just my opinion, if you both think bop-v2 is worth it, then go for it. :)

MartinSmeyer commented 1 year ago

(the per instance binary masks could become optional downloads in bop-v1 as well)

ylabbe commented 1 year ago

@ylabbe Shall we refer to the two new formats as BOP-v2 and BOP-v2-webdataset? If I understand it correctly, one could pack either of v1 and v2 into the webdataset format, so it would be good to make it explicit which version is packed.

The BOP-scenewise cannot be easily packed in webdataset, we would have to store all images of a scene in a single file. Please look at the top of bop_toolkit_lib/dataset/bop_imagewise.py and bop_toolkit_lib/dataset/bop_webdataset.py for a description of the formats.

@ylabbe "The object and camera annotations are stored in individual files (one file per image instead of one per chunk). This allows faster reading of individual images and annotations."

I don't think it's in general true that this yields faster reading (reading of many small files tends to be slower on shared network storages than reading one large file). Instead of "faster", let's say "more convenient".

I meant faster compared to loading the annotations for the entire scene, but it is true that the annotations file can be pre-loaded for relatively small datasets. I updated the PR description.

From the benchmarking results, it seems that webdataset is "only" ~2X faster in sequential access while much slower in random access (compared to v2). If someone wants to do random access, it means they need to convert webdataset to v2.

I'm wondering whether it is worth releasing both v2 and webdataset (and thus to increase complexity of maintaining & increasing the required space on our servers) since webdataset doesn't seem to bring significant benefits?

I think we could just release v2. If interested, people could then convert v2 to webdataset on their own using the script Yann wrote. Do you agree? @ylabbe, @MartinSmeyer

Unpacking webdataset to imagewise is fairly easy and can be done using a single bash command line that unpacks all tar archives whereas imagewise -> webdataset requires a script (It could be done with tarp but I had issues installing it, so probably better not to rely on this tool).

I would vote to keep the large datasets stored in webdataset format because the files to manipulate are smaller (each chunk of 1000 images is around ~500 Mb) compared to having a single .tar of 500GB containing all images. BOP-webdataset can just be seen as BOP-imagewise stored in a bunch of small .tar files.

Hmm, imo the web format might be useful for the large, new training datasets where you can't load annotations/masks into memory at the start of training. But I don't see much benefit in bop-v2 over bop-v1? Yann's benchmark is on loading all files, but the masks can just be loaded quickly from the coco gt data files (per sce

I agree v1/v2 naming was bad as formats can co-exist.

BOP-v1 has been renamed to BOP-scenewise (thats the current format where annotations are using some files storing annotations for the entire scene) BOP-v2 has been renamed to BOP-imagewise.

I updated the text in the PR description accordingly.

MartinSmeyer commented 1 year ago

Thanks for the updates. Last nitpick: "BOP-chunkwise" could be a better name than "BOP-scenewise" as for the PBR sets the annotations are actually saved in chunks of 1000 images that contain 40 scenes with 25 renderings each. Alternatively, one could also name it "BOP-original" to make it clear.

ylabbe commented 1 year ago

Oh I didn't even know that the PBR folders were chunks and not scenes. I think it may be good to keep the scene notation just because the annotation files in the top-level directory are scene_camera.json scene_gt.json etc. Perhaps we just add somewhere in the documentation that all images may not be from a unique physical scene ? I think it's important to mention it as otherwise this could affect methods that would make the assumption of uniqueness, e.g. for training a multi-view method.

thodan commented 1 year ago

Thanks for resolving all the remarks! I think we are ready to merge this PR?

ylabbe commented 1 year ago

i'm fine with merging