tdwg / ac

Audiovisual Core
http://www.tdwg.org/standards/638
Creative Commons Attribution 4.0 International
11 stars 6 forks source link

Term needed to designate when an image or ROI is a label #200

Closed baskaufs closed 2 months ago

baskaufs commented 3 years ago

This came up in assembling the use cases for the Views TG, but was determined to be out of scope for that group. However, there is a need for it and it is related to the new work on Regions Of Interest since an ROI could be flagged as label and not part of an organism.

edwbaker commented 2 years ago

Also relevant to audio - there are plenty of recordings on BioAoustica with spoken introductions (some are already annotated as such).

edwbaker commented 2 years ago

Was there consideration of this going in subjectPart?

baskaufs commented 2 years ago

We talked about it. However, I think there is an understanding that the "subject" of subjectPart is an organism. In the task group, we've been trying to group all of the parts in collections that include parts that are appropriate for particular organism groups. So I think we just decided this was out of scope.

However, it's an important thing and maybe we need to broaden the concept of "subject" to include things like labels and sound descriptions. If it doesn't go in "subjectPart", then where does it go?

I'm going to be focused on the Views TG for the immediate future, so I'll keep this issue in mind.

deepreef commented 2 years ago

However, I think there is an understanding that the "subject" of subjectPart is an organism.

In principle I agree. However, there is a subtle but important aspect of this that should be considered. Technically, in this context, the media documents not just an Organism, but an Organism at a particular place and a particular time. Another way of saying this is that it documents an Organism at an Event. And, by extension, yet another way of saying this is that it documents an Occurrence.

This seems abstract, but ultimately it fits into the idea that media can serve as evidence of occurrence.

If there is a spoken introduction included on the audio recording, then that also represents evidence of occurrence (i.e., of an organism identifiable as Homo sapiens, at whatever time/place the spoken commentary was recorded). Granted, that's not the sort of Occurrence record we in TDWG land tend to care much about, but fundamentally/informatically it's the same thing.

edwbaker commented 2 years ago

Just to list some potential usage examples - and potentially expand the scope.

  1. I want to find images of an organism (not a label) to show what that organism looks like.
  2. I specifically want labels to do something (maybe some ML).
  3. I have a photo of a specimen and a label and I want to know where the organism/label/ruler/colour bar is.
  4. I have a sound recording with spoken metadata and I want to remove that to run an analysis.

An example of an image with lots of non-organism regions: https://data.nhm.ac.uk/media/334690/preview

Perhaps a term called something like contents with a new controlled vocabulary that explicitly states what is in an image or ROI. (This might also solve https://github.com/tdwg/ac/issues/166 if the contents could be 'organism in habitat' or 'habitat' but my thinking isn't too clear on this yet).

deepreef commented 2 years ago

I have a (potentially stupid and/or answered) question: Did we agree that every image/video/media item automatically gets assigned a "default" ROI representing the entirety of its content (e.g., as the "root ROI" or something like that)? I mean, just by virtue of someone sharing the media item in the first place implies that something about its content is "of interest" -- so we should presumably assume that the entire media item is, itself, an ROI.

This question was prompted by the post from @edwbaker referencing "Perhaps a term called something like contents with a new controlled vocabulary that explicitly states what is in an image or ROI." [emphasis added]

I'm just wondering if all properties related to content depicted/represented within any media item should be rooted to an ROI instance, rather than the media item itself. By default, any properties related to contents of a media item in general would be associated with the "root ROI" (entire media item), until a more restricted/precise ROI is defined for the media item.

I vaguely remember we discussed this on one of the calls, but I can't find my notes or any other documentation that addresses this question. Apologies if I am out of the loop...

edwbaker commented 2 years ago

In my head a ROI is a way of applying ac terms to a region (or regions) within a media item. Which might be the other way round to what you are thinking? (i.e. I could create a new image by cropping an image to a ROI; the ROI effectively defines a new media item (virtually), and inherits things like bitrate).

While there is a clear philosophical difference between the two, would they vary in a practical sense to people using ac?

deepreef commented 2 years ago

Thanks, yeah -- that's sort of what I'm thinking as well. But in my mind, some ac terms represent properties of the media item itself, and other represent properties of the content depicted within the media. I guess what I'm thinking is that an ROI is potentially a different thing (class?) from the media item, and potentially serves as a "container" for properties about specific content items. In other words, I imagine it sort of as ROI being the bridge between a media item and the things depicted or represented as content by the media item.

I guess I'm just trying to sort out in my mind how to tease apart properties that apply to a media item, vs. properties that apply to content represented within the media item. I don't see a problem with derived media items mapping to an ROI of a parent Media Item. But for years I've wrestled with how to model properties of a media item compared to properties of content items represented by the media item.

baskaufs commented 2 years ago

I've been pondering this since @edwbaker posted his earlier use cases.

It seems to me that perhaps the way to handle these questions would be to apply some generic property to the ROIs that would indicate what they represented. At first I was thinking rdf:type or some other way of indicating the kind of ROI that it was. But upon reflection, that doesn't seem right, since the type of ROI could better indicate if it were a rectangle, circle, vector outline, etc. I was also thinking about dc:/dcterms:subject, but that doesn't seem quite right (definition: "A topic of the resource."). "Depicts" seems like it might make the most sense -- that's the term used for this kind of purpose on Wikimedia Commons with their structured data. There is the property foaf:depicts, which has a bit of semantic baggage (domain: foaf:Image, but that should be OK, I think). Here's actually a really ancient example that's quite similar to what we are talking about: https://www.w3.org/wiki/ImageDescriptionRdfExamples (see the "Describing part of an image ans what is depicted by it" example, which is pretty similar to our ROI).

The thing that's a bit mushy in these examples is that the example of "FOAF Depicts to describe who and what is in the image", one of the depicted things is a particular instance of a person, while the other examples are generic concepts of "Hotel" and "Car". In an example of an herbarium sheet, if we said that a particular ROI depicted a label, would we want to say it depicted a particular instance of a label (which could have properties like the specific text on it) or do we want to say that it generically depicts "a label". These seem to be two different kinds of things. In the first case, we might actually want to assign an identifier to the label so that we could link out to its OCR verbatim data and cleaned up fields. In the second case, we might have a controlled vocabulary for generic things that could be depicted on an herbarium sheet image: a label, an organism, a color scale, a ruler.

I think in the end there are multiple ways to skin this cat and perhaps we just need to experiment to see which is the most practical.

edwbaker commented 2 years ago

I'll try and describe an example.

One thing I have just noticed is ac:framerate is in the ROI vocabulary and mo:sample_rate is in Resource Creation - I will male a new issue.

baskaufs commented 2 years ago

Just pondering the examples in the ROI recipes document, I see that we used dcterms:description to indicate that the depicted things were "fruit", "mine", "song of a red-winged hawk", etc. dcterms:description expects a free text field, which would not work so well for a controlled vocabulary (potentially with an IRI value). In reviewing existing AC terms, it looks like maybe we should be using Iptc4xmpExt:CVterm, whose definition refers to "image" but could be extended to include ROIs. We could then create our own controlled vocabulary for "organism", "label", "spoken introduction", "color scale", etc.

In that case, a representation might look something like this for an audio recording:

    {
    "@id": "https://macaulaylibrary.org/asset/12345",
    "@type": "http://purl.org/dc/dcmitype/Sound",
    "dcterms:title": "ML245266991 Broad-winged Hawk Macaulay Library",
    "ac:hasROI": [
      {
        "@id": "https://macaulaylibrary.org/asset/12345#vi",
        "dcterms:description": "verbal introduction",
        "Iptc4xmpExt:CVterm": "http://rs.tdwg.org/acdepicts/value/verbalIntroduction",
        "ac:startTime": 11.2,
        "ac:endTime": 11.9
      },
      {
        "@id": "https://macaulaylibrary.org/asset/245266991#vo1",
        "dwc:scientificName": "Vireo olivaceus",
        "dcterms:description": "song of red-eyed vireo",
        "Iptc4xmpExt:CVterm": "http://rs.tdwg.org/acdepicts/value/organismSound",
        "ac:startTime": 3.0,
        "ac:endTime": 3.6
      }
  }

and this for an herbarium specimen

    {
    "@id": "https://commons.wikimedia.org/wiki/File:._Sapindus_saponaria_L._(AM_AK32952).jpg",
    "@type": "http://purl.org/dc/dcmitype/StillImage",
    "dcterms:title": "Sapindus saponaria L. (AM AK32952)",
    "ac:hasROI": [
      {
        "@id": "https://commons.wikimedia.org/wiki/File:._Sapindus_saponaria_L._(AM_AK32952).jpg#label",
        "dcterms:description": "label",
        "Iptc4xmpExt:CVterm": "http://rs.tdwg.org/acdepicts/value/label",
        "ac:xFrac": 0.28939,
        "ac:yFrac": 0.23674,
        "ac:widthFrac": 0.09066,
        "ac:heightFrac": 0.26373
      },
      {
        "@id": "https://commons.wikimedia.org/wiki/File:._Sapindus_saponaria_L._(AM_AK32952).jpg#leaf",
        "dwc:scientificName": "Sapindus saponaria",
        "dcterms:description": "leaf of Sapindus saponaria",
        "Iptc4xmpExt:CVterm": "http://rs.tdwg.org/acdepicts/value/organismPart",
        "ac:xFrac": 0.21892,
        "ac:yFrac": 0.44792,
        "ac:widthFrac": 0.28147,
        "ac:heightFrac": 0.34612
      }
  }

The controlled values for Iptc4xmpExt:CVterm are completely made up and following current practice for controlled value IRIs, they should probably have opaque local names. But this would be a way one could represent this using existing terms.

baskaufs commented 2 years ago

With respect to whether there is an implied ROI that includes the entire media item, it seems like we pretty much have to allow that since we can hardly demand that providers who can only handle flat records support the one-to-many image:ROI relationship. For example, I think some people will want to (or already are) apply ac:subjectPart and ac:subjectOrientation to entire images if they only depict a single part, while others who have images that depict many organisms or parts will want to break down the image into ROIs so that they can describe each part.

I think it would be reasonable for a provider who wanted to aggregate both kinds of records to apply a "default ROI" assumption similar to what @edwbaker and @deepreef were discussing to split off the fields that would normally go into an ROI if it were there (such as ac:subjectPart) into an ROI whose bounds were the entire image.

edwbaker commented 2 years ago

This makes sense to me.

I understand the point made by @deepreef about separating things about media, and things depicted in the media, into classes - but can this be solved in a similar way? (some terms are clearly about the media itself, while others are clearly about what is depicted?) Maybe we can have this as a philosophical assumption, and ideally document this assumption, and only formalise it if there becomes a pressing reason to do so?

Just pondering the examples in the ROI recipes document, I see that we used dcterms:description to indicate that the depicted things were "fruit", "mine", "song of a red-winged hawk", etc. dcterms:description expects a free text field, which would not work so well for a controlled vocabulary (potentially with an IRI value). In reviewing existing AC terms, it looks like maybe we should be using Iptc4xmpExt:CVterm, whose definition refers to "image" but could be extended to include ROIs. We could then create our own controlled vocabulary for "organism", "label", "spoken introduction", "color scale", etc.

In that case, a representation might look something like this for an audio recording:

    {
    "@id": "https://macaulaylibrary.org/asset/12345",
    "@type": "http://purl.org/dc/dcmitype/Sound",
    "dcterms:title": "ML245266991 Broad-winged Hawk Macaulay Library",
    "ac:hasROI": [
      {
        "@id": "https://macaulaylibrary.org/asset/12345#vi",
        "dcterms:description": "verbal introduction",
        "Iptc4xmpExt:CVterm": "http://rs.tdwg.org/acdepicts/value/verbalIntroduction",
        "ac:startTime": 11.2,
        "ac:endTime": 11.9
      },
      {
        "@id": "https://macaulaylibrary.org/asset/245266991#vo1",
        "dwc:scientificName": "Vireo olivaceus",
        "dcterms:description": "song of red-eyed vireo",
        "Iptc4xmpExt:CVterm": "http://rs.tdwg.org/acdepicts/value/organismSound",
        "ac:startTime": 3.0,
        "ac:endTime": 3.6
      }
  }

and this for an herbarium specimen

    {
    "@id": "https://commons.wikimedia.org/wiki/File:._Sapindus_saponaria_L._(AM_AK32952).jpg",
    "@type": "http://purl.org/dc/dcmitype/StillImage",
    "dcterms:title": "Sapindus saponaria L. (AM AK32952)",
    "ac:hasROI": [
      {
        "@id": "https://commons.wikimedia.org/wiki/File:._Sapindus_saponaria_L._(AM_AK32952).jpg#label",
        "dcterms:description": "label",
        "Iptc4xmpExt:CVterm": "http://rs.tdwg.org/acdepicts/value/label",
        "ac:xFrac": 0.28939,
        "ac:yFrac": 0.23674,
        "ac:widthFrac": 0.09066,
        "ac:heightFrac": 0.26373
      },
      {
        "@id": "https://commons.wikimedia.org/wiki/File:._Sapindus_saponaria_L._(AM_AK32952).jpg#leaf",
        "dwc:scientificName": "Sapindus saponaria",
        "dcterms:description": "leaf of Sapindus saponaria",
        "Iptc4xmpExt:CVterm": "http://rs.tdwg.org/acdepicts/value/organismPart",
        "ac:xFrac": 0.21892,
        "ac:yFrac": 0.44792,
        "ac:widthFrac": 0.28147,
        "ac:heightFrac": 0.34612
      }
  }

The controlled values for Iptc4xmpExt:CVterm are completely made up and following current practice for controlled value IRIs, they should probably have opaque local names. But this would be a way one could represent this using existing terms.

baskaufs commented 2 years ago

@edwbaker I think that it would be appropriate to make the philosophical distinction about what properties belong with particular classes (media item, ROI, SAP). Whether people would sort the properties out that way would depend on how fully normalized their data management system is.

I think what we may need is an overhaul of the Audubon Core Structure document. For better or worse, it sets the precedent of allowing people to flatten out SAP records in a number of ways if they want to cram their normalized data into a single table. Now that we've added ROIs as another class under which properties can be organized, the waters are even muddier. So documenting the assumptions (without necessarily making them normative, at least for the time being) would probably help.

baskaufs commented 2 months ago

This issue is handled by #246, which is going to public comment