tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
45 stars 5 forks source link

Add bounding box #219

Closed peterdesmet closed 1 year ago

peterdesmet commented 2 years ago

See discussion in #203. Best solution was to add bounding box as a property of the observation. What isn't defined is the expected format.

ddachs commented 2 years ago

I just want to confirm the need for the bounding box info in the observations.csv table. I guess the format is rather secondary, as long as it is defined, because you can easily transform the coordinates.

peterdesmet commented 1 year ago

Discussed with @kbubnicki

ddachs commented 1 year ago

I recommend the YOLO format to be used. This way the coordinates will be independent of the image size (which can vary)

peterdesmet commented 1 year ago

@kbubnicki for Agouti, we would like if the bounding box field could also support the [x,y] position of animals. I guess that should be possible in yolo format ([x_center, y_center, width, height]) by having it as x, y, 0, 0?

peterdesmet commented 1 year ago

@danstowell in reply to https://github.com/tdwg/camtrap-dp/pull/314#issuecomment-1561610637, if you want to classify a media file containing 3 sparrows with bounding boxes, you would have the following 3 observations:

observationID mediaID scientificName start end boundingBox
obs1 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z [x1, y1, width1, height1]
obs2 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z [x2, y2, width2, height2]
obs3 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z [x3, y3, width3, height3]
kbubnicki commented 1 year ago

Alternatively, we could store a bounding box data in 4 separate columns, thus enforcing exactly one bounding box per observation row:

observationID mediaID scientificName start end bboxX bboxY bboxWidth bboxHeight
obs1 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z x1 y1 width1 height1
obs2 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z x2 y2 width2 height2
obs3 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z x3 y3 width3 height3

@danstowell I remember your comment about storing structured data within a CSV cell. What do you think?

kbubnicki commented 1 year ago

The format would be:

[
    {
        "name": "bboxX",
        "description": "The relative X coordinate of a bounding box center, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxY",
        "description": "The relative Y coordinate of a bounding box center, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxWidth",
        "description": "The relative width of a bounding box, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxHeight",
        "description": "The relative height of a bounding box, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    }
]

It is YOLO format (also suggested by @ddachs ). The advantage of this format (i.e. coordinates of the center instead of e.g. upper-left corner) is that bboxX and bboxY columns can be used to store information on the relative position of an animal on an image (e.g. estimated using image-calibration methods for distance sampling applications) without defining an entire bounding box. Then bboxWidth and bboxHeight are simply zeros.

peterdesmet commented 1 year ago

I like that approach.

danstowell commented 1 year ago

Yes, this is indeed a bit clearer. I wasn't planning to comment on that aspect though, because I don't know which of those two options (i.e. single compound column, or separated into columns) will be easier for your target users to produce/consume. If it matches YOLO format then that's an argument in support of it.

Within AudioVisual Core we specified something similar except it was a top-left corner. I rather wish the centrepoint had been an option we considered, since it has some handy properties. (I note also that in AC, zero-sized rectangles are explicitly disallowed, though zero-sized circles are to be used instead! So that's compatible.)

peterdesmet commented 1 year ago

Thanks @danstowell! Given that AudioVisual Core adopted top-left corner we might consider that too ... so we can reference the terms?

bboxX -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/xFrac
bboxY -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/yFrac

@danstowell @kbubnicki Or would you advise against that?

Note: the advantage to split into columns is that we can write easier validation (e.g. x should be between 0 and 1).

peterdesmet commented 1 year ago

@danstowell @baskaufs I'd like to know how we should reference the AC terms and how important the AC Notes are.

For example, our bboxWidth follows the of definition of http://rs.tdwg.org/ac/terms/widthFrac exactly:

The width of the bounding rectangle, expressed as a decimal fraction of the width of the media item.

But we might allow 0 widths, which contracts with the notes of http://rs.tdwg.org/ac/terms/widthFrac:

Zero-sized bounding rectangles are not allowed. To designate a point, use the radius option with a zero value.

Is our bboxWidth than still an exact match or is it broader (because we allow more)?

peterdesmet commented 1 year ago

Update based on #323

baskaufs commented 1 year ago

@peterdesmet Cool. Prior to adopting the AC terms, we looked at a number of systems for defining bounding boxes. Most (nearly all?) had 0,0 as the upper left corner. So following that convention simplifies the conversion to other systems.