Add bounding box - Githubissues

peterdesmet commented 2 years ago

See discussion in #203. Best solution was to add bounding box as a property of the observation. What isn't defined is the expected format.

ddachs commented 2 years ago

I just want to confirm the need for the bounding box info in the observations.csv table. I guess the format is rather secondary, as long as it is defined, because you can easily transform the coordinates.

peterdesmet commented 1 year ago

Discussed with @kbubnicki

Name term boundingBox (most recognizable)
Insert right after mediaID (to zoom in further)
Only use it for media-observations table (not event-observations)
Definition to be provided
Recommended format to be provided

ddachs commented 1 year ago

I recommend the YOLO format to be used. This way the coordinates will be independent of the image size (which can vary)

peterdesmet commented 1 year ago

@kbubnicki for Agouti, we would like if the bounding box field could also support the [x,y] position of animals. I guess that should be possible in yolo format ([x_center, y_center, width, height]) by having it as x, y, 0, 0?

peterdesmet commented 1 year ago

@danstowell in reply to https://github.com/tdwg/camtrap-dp/pull/314#issuecomment-1561610637, if you want to classify a media file containing 3 sparrows with bounding boxes, you would have the following 3 observations:

observationID	mediaID	scientificName	start	end	boundingBox
obs1	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	`[x1, y1, width1, height1]`
obs2	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	`[x2, y2, width2, height2]`
obs3	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	`[x3, y3, width3, height3]`

kbubnicki commented 1 year ago

Alternatively, we could store a bounding box data in 4 separate columns, thus enforcing exactly one bounding box per observation row:

observationID	mediaID	scientificName	start	end	bboxX	bboxY	bboxWidth	bboxHeight
obs1	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	x1	y1	width1	height1
obs2	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	x2	y2	width2	height2
obs3	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	x3	y3	width3	height3

@danstowell I remember your comment about storing structured data within a CSV cell. What do you think?

kbubnicki commented 1 year ago

The format would be:

[
    {
        "name": "bboxX",
        "description": "The relative X coordinate of a bounding box center, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxY",
        "description": "The relative Y coordinate of a bounding box center, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxWidth",
        "description": "The relative width of a bounding box, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxHeight",
        "description": "The relative height of a bounding box, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    }
]

It is YOLO format (also suggested by @ddachs ). The advantage of this format (i.e. coordinates of the center instead of e.g. upper-left corner) is that bboxX and bboxY columns can be used to store information on the relative position of an animal on an image (e.g. estimated using image-calibration methods for distance sampling applications) without defining an entire bounding box. Then bboxWidth and bboxHeight are simply zeros.

peterdesmet commented 1 year ago

I like that approach.

danstowell commented 1 year ago

Yes, this is indeed a bit clearer. I wasn't planning to comment on that aspect though, because I don't know which of those two options (i.e. single compound column, or separated into columns) will be easier for your target users to produce/consume. If it matches YOLO format then that's an argument in support of it.

Within AudioVisual Core we specified something similar except it was a top-left corner. I rather wish the centrepoint had been an option we considered, since it has some handy properties. (I note also that in AC, zero-sized rectangles are explicitly disallowed, though zero-sized circles are to be used instead! So that's compatible.)

peterdesmet commented 1 year ago

Thanks @danstowell! Given that AudioVisual Core adopted top-left corner we might consider that too ... so we can reference the terms?

bboxX -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/xFrac
bboxY -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/yFrac

@danstowell @kbubnicki Or would you advise against that?

Note: the advantage to split into columns is that we can write easier validation (e.g. x should be between 0 and 1).

peterdesmet commented 1 year ago

@danstowell @baskaufs I'd like to know how we should reference the AC terms and how important the AC Notes are.

For example, our bboxWidth follows the of definition of http://rs.tdwg.org/ac/terms/widthFrac exactly:

The width of the bounding rectangle, expressed as a decimal fraction of the width of the media item.

But we might allow 0 widths, which contracts with the notes of http://rs.tdwg.org/ac/terms/widthFrac:

Zero-sized bounding rectangles are not allowed. To designate a point, use the radius option with a zero value.

Is our bboxWidth than still an exact match or is it broader (because we allow more)?

peterdesmet commented 1 year ago

Update based on #323

We have now adopted top-left corner rather than center. It aligns with Megadetector format and AC
We don't allow 0 values anymore
AC terms are broader than Camtrap DP terms, because the bounding boxes should encompass observed individuals, not just any object.

baskaufs commented 1 year ago

@peterdesmet Cool. Prior to adopting the AC terms, we looked at a number of systems for defining bounding boxes. Most (nearly all?) had 0,0 as the upper left corner. So following that convention simplifies the conversion to other systems.

tdwg / camtrap-dp

Add bounding box #219