Closed peterdesmet closed 1 year ago
I just want to confirm the need for the bounding box info in the observations.csv table. I guess the format is rather secondary, as long as it is defined, because you can easily transform the coordinates.
Discussed with @kbubnicki
boundingBox
(most recognizable)mediaID
(to zoom in further)I recommend the YOLO format to be used. This way the coordinates will be independent of the image size (which can vary)
@kbubnicki for Agouti, we would like if the bounding box field could also support the [x,y] position of animals. I guess that should be possible in yolo format ([x_center, y_center, width, height]
) by having it as x, y, 0, 0
?
@danstowell in reply to https://github.com/tdwg/camtrap-dp/pull/314#issuecomment-1561610637, if you want to classify a media file containing 3 sparrows with bounding boxes, you would have the following 3 observations:
observationID | mediaID | scientificName | start | end | boundingBox |
---|---|---|---|---|---|
obs1 | med1 | Passer domesticus | 2020-08-02T05:00:15Z | 2020-08-02T05:00:15Z | [x1, y1, width1, height1] |
obs2 | med1 | Passer domesticus | 2020-08-02T05:00:15Z | 2020-08-02T05:00:15Z | [x2, y2, width2, height2] |
obs3 | med1 | Passer domesticus | 2020-08-02T05:00:15Z | 2020-08-02T05:00:15Z | [x3, y3, width3, height3] |
Alternatively, we could store a bounding box data in 4 separate columns, thus enforcing exactly one bounding box per observation row:
observationID | mediaID | scientificName | start | end | bboxX | bboxY | bboxWidth | bboxHeight |
---|---|---|---|---|---|---|---|---|
obs1 | med1 | Passer domesticus | 2020-08-02T05:00:15Z | 2020-08-02T05:00:15Z | x1 | y1 | width1 | height1 |
obs2 | med1 | Passer domesticus | 2020-08-02T05:00:15Z | 2020-08-02T05:00:15Z | x2 | y2 | width2 | height2 |
obs3 | med1 | Passer domesticus | 2020-08-02T05:00:15Z | 2020-08-02T05:00:15Z | x3 | y3 | width3 | height3 |
@danstowell I remember your comment about storing structured data within a CSV cell. What do you think?
The format would be:
[
{
"name": "bboxX",
"description": "The relative X coordinate of a bounding box center, normalized to the image width.",
"type": "number",
"constraints": {
"required": false,
"minimum": 0,
"maximum": 1
},
"example": 0.5
},
{
"name": "bboxY",
"description": "The relative Y coordinate of a bounding box center, normalized to the image height.",
"type": "number",
"constraints": {
"required": false,
"minimum": 0,
"maximum": 1
},
"example": 0.5
},
{
"name": "bboxWidth",
"description": "The relative width of a bounding box, normalized to the image width.",
"type": "number",
"constraints": {
"required": false,
"minimum": 0,
"maximum": 1
},
"example": 0.5
},
{
"name": "bboxHeight",
"description": "The relative height of a bounding box, normalized to the image height.",
"type": "number",
"constraints": {
"required": false,
"minimum": 0,
"maximum": 1
},
"example": 0.5
}
]
It is YOLO format (also suggested by @ddachs ). The advantage of this format (i.e. coordinates of the center instead of e.g. upper-left corner) is that bboxX
and bboxY
columns can be used to store information on the relative position of an animal on an image (e.g. estimated using image-calibration methods for distance sampling applications) without defining an entire bounding box. Then bboxWidth
and bboxHeight
are simply zeros.
I like that approach.
Yes, this is indeed a bit clearer. I wasn't planning to comment on that aspect though, because I don't know which of those two options (i.e. single compound column, or separated into columns) will be easier for your target users to produce/consume. If it matches YOLO format then that's an argument in support of it.
Within AudioVisual Core we specified something similar except it was a top-left corner. I rather wish the centrepoint had been an option we considered, since it has some handy properties. (I note also that in AC, zero-sized rectangles are explicitly disallowed, though zero-sized circles are to be used instead! So that's compatible.)
Thanks @danstowell! Given that AudioVisual Core adopted top-left corner we might consider that too ... so we can reference the terms?
bboxX -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/xFrac
bboxY -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/yFrac
@danstowell @kbubnicki Or would you advise against that?
Note: the advantage to split into columns is that we can write easier validation (e.g. x should be between 0 and 1).
@danstowell @baskaufs I'd like to know how we should reference the AC terms and how important the AC Notes
are.
For example, our bboxWidth
follows the of definition of http://rs.tdwg.org/ac/terms/widthFrac exactly:
The width of the bounding rectangle, expressed as a decimal fraction of the width of the media item.
But we might allow 0 widths, which contracts with the notes of http://rs.tdwg.org/ac/terms/widthFrac:
Zero-sized bounding rectangles are not allowed. To designate a point, use the radius option with a zero value.
Is our bboxWidth
than still an exact match or is it broader (because we allow more)?
Update based on #323
@peterdesmet Cool. Prior to adopting the AC terms, we looked at a number of systems for defining bounding boxes. Most (nearly all?) had 0,0 as the upper left corner. So following that convention simplifies the conversion to other systems.
See discussion in #203. Best solution was to add bounding box as a property of the observation. What isn't defined is the expected format.