mlcommons / croissant

Croissant is a high-level format for machine learning datasets that brings together four rich layers.
https://mlcommons.org/croissant
Apache License 2.0
415 stars 39 forks source link

Supporting many bounding boxes within an image #673

Open Irenetema opened 3 months ago

Irenetema commented 3 months ago

I'm working on a subset of the Snapshot Serengeti camera traps dataset, and each image has multiple animals with bounding boxes indicating each animal location. The current implementation of the BOUNDING_BOX type does not seems to support more than 1 box per image. Is possible to support multiple bounding boxes?

Making the BOUNDING_BOX type of croissant access a list of lists for multiple bounding boxes could be a solution.

Thanks, --Tema

benjelloun commented 3 months ago

Hi Irene,

Can you try marking the field with the bounding box as "repeated": "true". This is how you can specify that you have multiple bounding boxes for an image in the Croissant format. That said, I don't know if the mlcroissant python library supports reading repeated fields from JSON at the moment. @marcenacp

Best, Omar

ccl-core commented 3 months ago

Yes, it does.

See an example dataset here: https://github.com/mlcommons/croissant/blob/0f95e04763557929e4f4c6711c108c0d9cf7b818/datasets/1.0/wiki-text/metadata.json#L138

Irenetema commented 3 months ago

Thanks for you prompt reaction @benjelloun.

@ccl-core I can see the json file you referenced with "repeated": "True" but how do I implement this using the python library?

https://github.com/mlcommons/croissant/blob/main/python/mlcroissant/recipes/bounding-boxes.ipynb

How do I modify this COCO example if an image has two bounding boxes and the other one has one?

Suppose the following example is my json annotation that I would like to format and read using croissant.

{
  "0": {
    "image_path": "S6/P07/P07_R2/S6_P07_R2_IMAG0077.JPG", 
    "width": 2048, 
    "height": 1536, 
    "animal_name": "eland",  
    "animal_count": 1, 
    "bboxes": {
      "0": [0.5068, 0.008463, 0.4931, 0.9388]}
  }, 
  "1": {
    "image_path": "S2/U10/U10_R2/S2_U10_R2_IMAG0661.JPG", 
    "width": 2048, 
    "height": 1536, 
    "animal_name": "eland", 
    "animal_count": 2,  
    "bboxes": {
      "0": [0.7353, 0.2858, 0.2646, 0.608], 
      "1": [0.3205, 0.0633, 0.3531, 0.8798]
    }
  }
}

How I can read/iterate over each images and their bounding boxes using the mlcroissant python library?