tdwg / ac

Audiovisual Core
http://www.tdwg.org/standards/638
Creative Commons Attribution 4.0 International
12 stars 6 forks source link

How do we link media items to occurrence records? #191

Open baskaufs opened 3 years ago

baskaufs commented 3 years ago

Currently Audubon Core has ac:associatedObservationReference and ac:associatedSpecimenReference to link a media item to observations and specimens. However, occurrences can be considered as distinct from specimens and observations (as forms of evidence) and may have different identifiers (occurrenceID vs. specimenID for example). So is there a term missing for linking a media item specifically to an occurrenceID?

There are also cases where an occurrence is documented specifically by a media item without any other form of evidence (e.g. camera trap). What term would be used to make those kind of links?

We should look at the XML schema for the AC IPTC extension to see what term (if any) is used to indicate the "foreign key" in the extension table that makes the link to the occurrence record in the core table. That relationship is the one we are talking about here.

edwbaker commented 8 months ago

@tucotuco - on the recommendation of @baskaufs, would you be willing to come and discuss this at a future AC Maintenance Group?

tucotuco commented 8 months ago

Certainly.

davidsean commented 1 week ago

There are also cases where an occurrence is documented specifically by a media item without any other form of evidence (e.g. camera trap). What term would be used to make those kind of links?

I think occurrenceID would work when the occurrence itself contains a BasisofRecord that is supposed to qualify what type of occurrence ( HumanObservation vs MachineObservation )

tucotuco commented 1 week ago

That's exactly what happens when publishing a Darwin Core Archive with an extension for AC (https://rs.gbif.org/extension/ac/audubon_2020_10_06.xml). The extension has the occurrenceID for the Occurrence the media is related to.

deepreef commented 1 week ago

I'm just seeing this thread now for the first time (sorry... been a bit unplugged from the biodiversity data world, as I've been swallowed by the biodiversity discovery and documentation world...)

I gather from the responses above that this has been resolved (more or less) in TDWG-space, but because Rob Whitton and I spent a LOT of time thinking about this stuff, I thought I'd share how we ended up handling this.

To begin with, we treat records representing media (images, videos, audio recordings, etc.) as a form of "Evidence" in our data model. Some may recall that we've discussed the idea of establishing an "Evidence" class within DwC (not sure if that ever went anywhere -- again, I've been unplugged). In our (biodivsersity data management) context, "Evidence" is primarily useful in two somewhat different ways:

1) "Evidence of Occurrence" (i.e., evidince that a particular Organism was, indeed, present during a particular Event). 2) "Evidence of Taxonomic Identification" (i.e., evidence supporting an assertion that a particular Organism is appropriaely assigned to a particular TaxonConcept/Circumscription)

This thread is focused on the former, so I'll limit my descriptions accordingly.

Essentially, we support a M:M relationship between Evidence and Occurrence instances (which we achieve in our implementation using an "EvidenceOccurrence" join table). The reason for M:M is that any given piece of Media (e.g., a video taken on a Coral Reef) might serve as Evidence for the presence of many different Organisms at a particular Event (i.e., many different Occurrences), and conversely, any given Occurrence might be supported by multiple pieces of Media to serve as evidence (e.g., I took three photos of the same fish at the same time on the same reef).

That's the easy part.

Where things get a bit tricker is the (VERY common) scenario where (e.g.) we go out for a dive and take a bunch of videos of fish on the reef, then collect some of them as specimens, bring them back to base camp, properly prepare them, and take studio "portraits" of the specimens (e.g., with their fins all pinned out, etc.) I use one of our real-world examples here, but I'm sure the same applies to all manner of naturalists who make in-situ images of organisms before collecting them as specimens, and then later creating images of them after they have been curated as specimens.

The lazy solution is to link both the in-situ video of the fish on the reef, and the studio portrait image of the same fish back at base camp, to the same Occurrence instance (i.e., the fish-Organism as it occurrred on the reef). I call this "lazy" because, while it does accurately represent the video (in this example) as Evidence-of-Occurrence (on the reef), it incorrectly represents the studio portrait image as Evidence-of-Occurrence (on the reef).

This conundrum led us to expanding our interpretation of the scope of "Occurrence" to accommodate much more than just "natural" occurrences. Of course, DwC has always accommodated this (e.g., dwc:degreeOfEstablishment), but mostly in the context of captive/living things. For a while, we dealt with this by minting Occurrence instances to track the presence of the physical Organism itself in multiple events (we define an "Occurrence" as the intersection of an "Organsims" and an "Event"). Thus, we created what we call "Imaging Events", which represent the time and place where specimens are prepared and photographed in a studio setting (e.g., at "base camp" when we're in the field). That way, the Video media item can be linked to the occurrence of the organism as it swam on the reef (before it's unfortunate demise), and the studio portrait image can be linked to the occurrence of the organism at it sat dead in a photo-tank back at base camp. Because both Occurrence instances are linked to the same Organism instance, the two media items (video + studio portrait) can be connected via the shared Organism instance.

This works reasonably well in our data model implementation. However, it also put a spotlight on the subtle distinction between "Organism" and "MaterialSample" in the DwC context. Some of you may recall that I publically wreslted with this issue in TDWG-land a number of years ago, and honestly I never got a satisfactory answer. The issue boils down to: "When does an Organism become a MaterialSample?" The best answers (in my opinion) allow for both to exist simultaneously (in that the physical material making up a living thing can be captured informatically as both an "Organsim" instance and a "MaterialSample" instance at the same time). Lots of opportunities for lofty philosophical meanderings about this, but putting those aside, the more practical issue is: "Should we accommodate the ability to link a Media item directly to a MaterialSample instance, or should such a links be inherited through an Evidence-EvidenceOccurrence-Occurrence-Organism-MaterialSample chain?"

I wish I had a satisfactory answer, but I don't. In one sense, of COURSE we need to be able to link media items directly to MaterialSample instances (I can provide all sorts of examples for this reasoning). What makes it tricky is deciding WHEN to link a media item directly to a MaterialSample instance, and WHEN to link it to the associated Organism (and when to, perhaps, link it to both simultaneously). Again, I wish I had a satisfactory answer, but at this stage I don't.

When I was thinking a lot about this stuff, I started down the path of the following line of thought:

In DwC, "Organism" doesn't quite work because ultimately the scope of an Organism instance is conceptual, not physical. "MaterialSample" doesn't work because it represents only a subset of "Material" things (i.e., those that are "sampled").

So, my bottom line conclusion is that there is no easy way within DwC to model the relationship between a media item and the content it represents (whether in th form of "Evidence" or as something else). Moreover, developing a model that accomodates the "reality" of what Media items are important for in the context of the biodiversity data landscape would probably be so convoluted as to be utterly impractical in a general data standard.

As an aside, our model is not limited to media associated with living things. Our "Organism" is actually "IndividualOrganism", and it's scope includes all manner of things; not just living things).

As another aside, there is also the issue of capturing the "Location" of the content items in an image, vs. the Location of where the image capturing device was (sometimes the difference doesn't matter; sometimes it does).

I apologize for the super-long rambly post here, and I fully acknowledge that it is of extremely limited relevance to the topic at hand. But this particular issue is one that I spent a lot of time thinking about (almost to the point of madness -- some would say "not almost"), and I never felt satisfied with the conclusion. Besides, it's been so long since I've commented on any of these discussions, that I figure it's fair for me to regurgitate so much here all at once. It also feels good to be thinking/writing about this stuff again after a long hiatus.

Retreating to lurking mode....

klausriede commented 23 hours ago

maybe boil this down and sart from the other side: we captured an organism and make pictures and sound recordings in captivity . The specimen turns into a museum voucher specimen with labeldata pinned underneath (collector, date etc , an LSID ? if databased elsewhere etc) , plus an SD card containing the digital files of the mediadata (which might be many, so a 1:m relationship: pcis from different angles, videos and sounds recording at distinct times etc... Should be easy to model, right?