Proposal to add terms to define spatial regions of interest within a media item

baskaufs commented 3 years ago

This proposal is the result of an extended discussion by the Maintenance Group about developing a system to define and demarcate portions of a media item. For details, see the meeting notes. For examples of how to use the new terms, see the Regions of Interest (ROI) Recipes document.

Proposed terms

Term name: ac:xFrac Type: rdf:Property Label: Fractional X Definition: The horizontal position of a reference point, measured from the left side of the media item and expressed as a decimal fraction of the width of the media item. Usage: A valid value MUST be greater than or equal to zero and less than or equal to one. The precision of this value SHOULD be great enough that when the ac:xFrac value is multiplied by the exif:PixelXDimension of the Best Quality variant of the Service Access point, rounding to the nearest integer results in the same horizontal pixel location originally used to define the point. Notes: This point can serve as the horizontal position of the upper left corner of a bounding rectangle, or as the center of a circle.

Term name: ac:yFrac Type: rdf:Property Label: Fractional Y Definition: The vertical position of a reference point, measured from the top of the media item and expressed as a decimal fraction of the height of the media item. Usage: A valid value MUST be greater than or equal to zero and less than or equal to one. The precision of this value SHOULD be great enough that when the ac:yFrac value is multiplied by the exif:PixelYDimension of the Best Quality variant of the Service Access point, rounding to the nearest integer results in the same vertical pixel originally used to define the point. Notes: This point can serve as the vertical position of the upper left corner of a bounding rectangle, or as the center of a circle.

Term name: ac:widthFrac Type: rdf:Property Label: Fractional Width Definition: The width of the bounding rectangle, expressed as a decimal fraction of the width of the media item. Usage: The sum of a valid value plus ac:xFrac MUST be greater than zero and less than or equal to one. The precision of this value SHOULD be great enough that when ac:widthFrac and ac:xFrac are used with the exif:PixelXDimension of the Best Quality variant of the Service Access point to calculate the lower right corner of the rectangle, rounding to the nearest integer results in the same horizontal pixel originally used to define the point. This term MUST NOT be used with ac:radius to define a region of interest. Notes: Zero-sized bounding rectangles are not allowed. To designate a point, use the radius option with a zero value.

Term name: ac:heightFrac Type: rdf:Property Label: Fractional Height Definition: The height of the bounding rectangle, expressed as a decimal fraction of the height of the media item. Usage: The sum of a valid value plus ac:yFrac MUST be greater than zero and less than or equal to one. The precision of this value SHOULD be great enough that when ac:heightFrac and ac:yFrac are used with the exif:PixelYDimension of the Best Quality variant of the Service Access point to calculate the lower right corner of the rectangle, rounding to the nearest integer results in the same vertical pixel originally used to define the point. This term MUST NOT be used with ac:radius to define a region of interest. Notes: Zero-sized bounding rectangles are not allowed. To designate a point, use the radius option with a zero value.

Term name: ac:radius Type: rdf:Property Label: Radius Definition: The radius of a bounding circle or arc, expressed as a fraction of the width of the media item. Usage: A valid value MUST be greater than or equal to zero. A valid value MAY cause the designated circle to extend beyond the bounds of the media item. In that case, the arc within the media item plus the bounds of the media item specify the region of interest. This term MUST NOT be used with ac:widthFrac or ac:heightFrac to define a region of interest. Notes: This term may be used with ac:xFrac and ac:yFrac to define a point. In that case, the implication is that the point falls on some object of interest within the media item, but nothing more can be assumed about the bounds of that object.

Rationale

These terms are described using relative rather than absolute dimensions because a Region of Interest applies to all Service Access Points defined for an abstract media item. Specifying ROIs in absolute units (i.e. pixels) creates a complexity as regions would have to be attached to a specific representation. Using fractional proportions allows for regions to be defined once for a media item while being applicable to multiple representations.

To determine the absolute position and bounds, multiply the relative values by the values of exif:PixelXDimension and exif:PixelYDimension for the particular Service Access Point.

Although this proposal does not currently include terms for a third dimension (z), they could be added in the future to define 3 dimensional ROIs.

afuchs1 commented 3 years ago

Hi Steve

Can I made a small suggestion that the issue description include that it relates within a media item to differentiate the usual use of spatial as the location the media was taken

eg. Proposal to add terms to define spatial regions of interest within a media item

cheers Anne

danstowell commented 3 years ago

I suggest defining somewhere which rounding mode is intended for "rounded" - I would assume rounded to the nearest integer (as opposed to rounded up, rounded down, or rounded towards zero).

baskaufs commented 3 years ago

@danstowell I revised the text to clarify this. See if it's better.

timrobertson100 commented 3 years ago

Super nit suggestion:

This term MUST NOT be used with ac:widthFrac and ac:heightFrac to define a region of interest

This term MUST NOT be used with ac:widthFrac or ac:heightFrac to define a region of interest

baskaufs commented 3 years ago

We love nit suggestions!

I was pondering whether it would be possible to use only one of ac:widthFrac or ac:heightFrac without the other. I guess we don't prohibit that, but I'm not sure that it would be meaningful since it wouldn't bound a box. So I think that's why it seemed to make sense to use "and" originally.

But I also don't think there is any harm with using "or", which would probably be simpler and cleaner.

baskaufs commented 3 years ago

Updated original proposal to include @timrobertson100's suggestion to use "or" instead of "and".

ben-norton commented 3 years ago

Machines will have a challenging time utilizing relative positioning. This may also be problematic for interoperability with GIS formats and processes. For example, this solution isn't interoperable with geotiffs, where a raster image has been georeferenced using absolute positioning. Computer vision processes bounding boxes using absolute positioning (x,y coordinates). The problem is certainly a challenging one. I need to test it further, but I would have a hard time using relative positioning for any hard data processing such as GIS or Computer Vision. With that said, the alternative may be just as problematic. However, I can say that preservation of the original image that has been annotated with a region of interest is critically important for reuse. There's an incentive to preserve it.

baskaufs commented 3 years ago

Hi @ben-norton. Thanks for your thoughtful comments about the proposal and for taking the time to make them.

I just wanted to note that the primary purpose of Audubon Core is as a data exchange standard to facilitate discovery, evaluate fitness-for-use, and to lower the barrier to gathering and serving multimedia resources (see the Motivation and Rationale behind Audubon Core). As such, it doesn't prescribe how providers and consumers maintain their own databases (i.e. their own field names, data format, etc.). Thus there is not necessarily an assumption that an Audubon Core record could directly be produced or consumed by users without some processing to make it conform to the term names and structure specified by the standard.

Given that the transformation from relative to absolute coordinates involves a single multiplication or division, can you elaborate more about the problems you foresee in making the transformation between absolute and relative coordinates? The proposal does specify that the precision of the relative values should be great enough that the exact pixel values could be reconstructed for the highest resolution image available. Thus the process should not be lossy if this prescription is followed.

tucotuco commented 3 years ago

Machines will have a challenging time utilizing relative positioning. This may also be problematic for interoperability with GIS formats and processes. For example, this solution isn't interoperable with geotiffs, where a raster image has been georeferenced using absolute positioning. Computer vision processes bounding boxes using absolute positioning (x,y coordinates). The problem is certainly a challenging one. I need to test it further, but I would have a hard time using relative positioning for any hard data processing such as GIS or Computer Vision. With that said, the alternative may be just as problematic. However, I can say that preservation of the original image that has been annotated with a region of interest is critically important for reuse. There's an incentive to preserve it.

@ben-norton I am curious why you say this solution is not interoperable with GeoTIFFs, or why it would be expected to be. The information embedded in the GeoTIFF allows that image to be aligned properly with other layers in GIS. The RoI here isn't expected to be used to create a georeference (sensu Chapman & Wieczorek 2020) without someone creating a tool to do so, but that seems way out of scope anyway. Can you elaborate? And can you also indicate if your comments constitute an objection to adopting the terms?

ben-norton commented 3 years ago

Machines will have a challenging time utilizing relative positioning. This may also be problematic for interoperability with GIS formats and processes. For example, this solution isn't interoperable with geotiffs, where a raster image has been georeferenced using absolute positioning. Computer vision processes bounding boxes using absolute positioning (x,y coordinates). The problem is certainly a challenging one. I need to test it further, but I would have a hard time using relative positioning for any hard data processing such as GIS or Computer Vision. With that said, the alternative may be just as problematic. However, I can say that preservation of the original image that has been annotated with a region of interest is critically important for reuse. There's an incentive to preserve it.

@ben-norton I am curious why you say this solution is not interoperable with GeoTIFFs, or why it would be expected to be. The information embedded in the GeoTIFF allows that image to be aligned properly with other layers in GIS. The RoI here isn't expected to be used to create a georeference (sensu Chapman & Wieczorek 2020) without someone creating a tool to do so, but that seems way out of scope anyway. Can you elaborate? And can you also indicate if your comments constitute an objection to adopting the terms?

@tucotuco

My comments shouldn't be interpreted as an objection to the adoption. I joined this conversation at the last minute, which forgoes the standing to a formal objection. I'm not going to derail the substantial amount of work and discussion that predates my participation.
I do have questions. Some of these are based on assumptions (see item 1), that should be clarified. Under ideal circumstances, areas of interest on an image are defined by absolute positioning. This leaves no ambiguity or distortion. The problem is that absolute coordinates are only relevant within the original context. if you don't have access to the original image, absolute positions are no longer useful. In general, relative positions have a lower value than absolute. However, workflows with a risk of losing the original image, relative positions may be a necessary compromise to prevent the worst-case scenario of data loss. Based on that logic alone, relative coordinates make sense. This is where my questions come in - situations where this compromise doesn't work. Your response and linked publication make it clear that georeferenced raster images are not an issue. Bounding boxes for computer vision may still be an issue. I don't think this is sufficient grounds to object, but the shortcomings of the proposed solution it is worth noting.

tucotuco commented 3 years ago

I am curious, but not proposing, if a combination of absolute coordinates of original source and dimensions of original source would overcome the issues you have identified. In that alternate view, the quantities that are being proposed could be calculated for any resolution derivative that maintain the same limits of content (not cropped).

baskaufs commented 3 years ago

I've been holding off on closing the comment period until we've determined that @ben-norton's comment didn't constitute an objection. It appears that it doesn't, so we'll move forward in the process.

I should also mention that this thread has caused the AC Maintenance Group to have some further discussions about how absolute coordinate terms could fit into the picture if there was sufficient demand for them. I put together a document that looks at how absolute coordinates might fit into the picture if we had terms for them. The complication comes from the Audubon Core model that differentiates between an abstract media item and service access points that can represent size variants of the abstract media item. Without that complication, the situation is simple (Strategy 2, basically what @tucotuco was talking about) in the document, but with it the situation gets messier (Strategy 3 and 4).

Anyway, we have that document and this discussion on the shelf for future reference if we come back to the issue of absolute coordinates.

baskaufs commented 3 years ago

Updated proposal to fix incorrect capitalization of rdf:Property

baskaufs commented 3 years ago

Update proposal to correct error in Usage of ac:heightFrac:

The sum of a valid value plus ac:xFrac MUST be greater than zero and less than or equal to one...

changed to

The sum of a valid value plus ac:yFrac MUST be greater than zero and less than or equal to one...

ben-norton commented 3 years ago

I am curious, but not proposing, if a combination of absolute coordinates of original source and dimensions of original source would overcome the issues you have identified. In that alternate view, the quantities that are being proposed could be calculated for any resolution derivative that maintains the same limits of content (not cropped).

I realize that comments are closed, but just for the record -> @tucotuco Yes. An alternative to the original dimensions field is an absolute reference to the original image. Although less likely than a resize due to basic mechanics, cropping an image would render both absolute and relative positioning obsolete. An absolute persistent reference to the original image negates this from occurring.

baskaufs commented 2 years ago

Ratified by the Executive on 2021-10-05 and implemented in https://github.com/tdwg/rs.tdwg.org/pull/79 and https://github.com/tdwg/ac/pull/212

tdwg / ac

Proposal to add terms to define spatial regions of interest within a media item #207

Proposed terms

Rationale