tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
44 stars 5 forks source link

Crosslink to Audubon Core definitions (with `skos:`) where appropriate #191

Closed peterdesmet closed 2 years ago

peterdesmet commented 2 years ago

As we did with Dublin Core and Darwin Core, crosslink to Audubon Core for terms that are equivalent and defined there.

ben-norton commented 2 years ago

Agree. https://ac.tdwg.org/termlist/

peterdesmet commented 2 years ago

Note that the relationships described below are item -> relatedItem, while the skos matches are actually relatedItem -> item. This was fixed in https://github.com/tdwg/camtrap-dp/commit/48b7243e7abff54ac5a50e87e0e58e099575637a

I went over all the AC terms to see which ones could be linked to from Camtrap DP terms. Below are my conclusions/remarks. I'll update the icon with ✅ or ❌ once there is feedback whether it should be respectively adopted or not. Use the numbers to refer to a specific item.

@tdwg/ac, @ edwbaker, @baskaufs your feedback would be much appreciated! Even if we can already check off some of the easier ones.

  1. ❌ Ignored borrowed dwc terms, since those are already included in Camtrap DP where relevant.

  2. rightsHolder links to http://purl.org/dc/terms/rightsHolder. I am not going to add or replace this with AC's suggested http://ns.adobe.com/xap/1.0/rights/Owner I think. xmpRights:Owner technically better since literal, but does not cover orgs managing rights.

  3. ☑️ observationType in a way describes what a (sequence of) images contains, and could be seen as a Subject Category with its own vocabulary. Values are strings. According to the definition:

    as long as all unqualified terms in all vocabularies are unique, metadata SHOULD provide the source vocabularies using the Subject Category Vocabulary term

    Which I guess means we should then also add a Subject Category Vocabulary term? That is a bit of a loop, because observationType is both a Subject Category (when used in data) and a Subject Category Vocabulary (because the specs have an enumeration of allowed values). Not sure what the best approach is here.

  4. tags could be seen as a ac:tag. However, Camtrap DP tags relate to the camera deployment so tags could be winter 2020, while AC tag likely refer to media files? Maybe a skos:broadMatch? yes, broadMatch

  5. mediaID is maybe a skos:narrowMatch of http://purl.org/dc/terms/identifier? But so are the other ID fields, which already have more specific dwc IDs. Worth adding? narrowmatch with identifier in media, don't add to other fields

  6. favourite - TRUE if media is of interest - doesn't really match with rating in my opinion, because that one specifically requires a 1 to 5 (star rating) range. I don't see much advantage adopting a 1-5 rating for our use case.

  7. comments - Comments or notes about the media file. - does seem to match with ac:comments, but the latter one is probably more broadly applicable than comments about a media file, so maybe we have to opt for skos:narrowMatch here? As with ID fields, deployment and observation comments already use more specific dwc remark fields. exactMatch with ac:comments, don't add to other fields

  8. ☑️ _id - occurs for deployments, media and observations with a definition like "Unique identifier of the x as assigned by the data management system." Can it be considered a ac:providerManagedID? Is this then a skos:exactMatch or skos:narrowMatch?

  9. timestamp - Date and time when the media file was recorded- is likely a skos:exactMatch for xmp:CreateDate.

  10. taxonomic gives an overview of all taxa in a dataset. This is not the same as ac:taxonCoverage which gives a single taxon name at a higher level that encompasses all taxa seen in an image.

  11. cameramodel is likely an skos:exactMatch of ac:captureDevice. We are just a bit more strict with how it should be written (make + model), so maybe it is skos:narrowMatch? yes, narrowMatch

  12. locationName already links to http://rs.tdwg.org/dwc/terms/locality, but could also be seen as Location Created, but it won't be directly associated with the media files (it's in deployment) and is always the location depicted, so there is no confusion there. Would not add this.

  13. captureMethod - values motion detection and time lapse - is likely a skos:narrowMatch of ac:resourceCreationTechnique. The latter is free text (no vocabulary), so I consider this a narrowMatch.

  14. ❌ An observation in Camtrap DP is associated with a media file, not the other way around, so we can't use ac:associatedObservationReference.

  15. filePath is a URL or local path to a (representation, not specified) of a media file. Is this a ac:accessURI or ac:ServiceAccessPoint? I don't really understand these concepts and whether they accept local paths too? And do we have to provide the ac:hasServiceAccessPoint then too somewhere? narrowMatch of ac:accessURI, allows online/offline resources

  16. ❌ The AC Region of Interest Vocabulary terms might come in handy if Camtrap DP is extended to support indicating where e.g. an animal was observed in an image, but not right now.

  17. ❌ Can an observation based on a (sequence of) image(s) could be considered a region of interest? In addition, I'm confused by the terms ac:hasROI and ac:isROIOf. It looks like the definitions got switched. The first must be a media file, the second a ROI, but the second says "The media item within which a region of interest is located.". Don't consider observations as a ROI of a (sequence of images)

  18. ❌ Observaton timestamp already has http://rs.tdwg.org/dwc/terms/eventDate, but also says "for sequence-based observations the timestamp of the first media file in the associated sequence (in sequenceID)." so it could be seen as ac:startTimestamp, but I'm not convinced it adds much?

danstowell commented 2 years ago

Peter's crosslink suggestions seem good to me. Some quick responses:

I would accept with all the "ticked boxes" that I haven't mentioned.

peterdesmet commented 2 years ago

@danstowell Thank you very much for the review!

danstowell commented 2 years ago
peterdesmet commented 2 years ago
peterdesmet commented 2 years ago

If I consider 17 answered (correct me if my suggestion for it in https://github.com/tdwg/camtrap-dp/issues/191#issuecomment-1022357625 is incorrect), then there are only 2 questions left: observationtype and internal _id. See ☑️ in https://github.com/tdwg/camtrap-dp/issues/191#issuecomment-1022173750 for more details.

baskaufs commented 2 years ago

I think the issue with dcterms:rightsHolder that may have caused AC to prefer xmpRights:Owner is that the Dublin Core term should properly be used with a URI and not a string value. My guess is that a lot of people are using dcterms:rightsHolder (incorrectly) with strings anyway, so that may be a moot point. See section 3.3 and footnote 6 of the Darwin Core RDF Guide for more details.

baskaufs commented 2 years ago

I've read through the mappings and the subsequent comments up to this one and I think it all makes sense to me except for one thing. I am confused about how you are interpreting regions of interest in this statement:

I would then add ac:isROIOf to two fields in observations: sequenceID and mediaID because both levels can be used to base observations on.

I would say that the concept of regions of interest is this: a media item can contain one or more regions of interest that are defined by their spacial or temporal extent. You'd link the media item to its ROI by ac:hasROI and you'd link the ROI to the media item with ac:isROIOf. But you would not link an occurrence to a ROI or vice versa with either ac:hasROI or ac:isROIOf.

If you wanted to link an occurrence to a ROI, you'd need the (not yet minted) dwc:evidence (or dwc:hasEvidence) and make a statement like this:

occurrence dwc:hasEvidence ROI.

I would not say:

occurrence ac:hasROI ROI

if that was what you meant because occurrences don't have regions of interest, media items do.

To link the other way you'd use something like ac:associatedObservationReference (if the occurrence was considered an observation):

ROI ac:associatedObservationReference occurrence.

There is an example of this in the Recipes document. It's a real example where the fruit part of the image of the plant was used to indicate the existence of an occurrence in GBIF.


I could see that if you were considering a sequence of images to be a media item in its own right, then I suppose one of the component images could be considered an ROI of it because it is a part of the media item defined by temporal extent, the same way a frame of a video could be a temporal ROI of the whole video. But I'm not sure that's what you were saying.

I may also just be totally misunderstanding what you intended.

peterdesmet commented 2 years ago

Thanks @baskaufs!

  1. Regarding rightsHolder, I'll see if I'll update that. So far it was useful we could basically copy/paste the definition and link to a term people know, but I see your point.

  2. Regarding regions of interest: you read it correctly. occurrence dwc:hasEvidence ROI would be more apt, and I won't consider occurrences ROIs then. I won't use ac:associatedObservationReference because we don't make the link in that direction (image -> observation, only observation -> (sequence of) images)?

Two ☑️ points (3. and 8.) remain in https://github.com/tdwg/camtrap-dp/issues/191#issuecomment-1022173750, suggestions welcome (it's regarding Subject category and providerManagedID)

peterdesmet commented 2 years ago

Question about narrowMatch and broadMatch. From Wikipedia (emphasis mine):

The property related simply makes an association relationship between two concepts; no hierarchy or generality relation is implied. The properties broader and narrower are used to assert a direct hierarchical link between two concepts. The meaning may be unexpected; the relation A broader B means that A has a broader concept called B—hence that B is broader than A. Narrower follows in the same pattern.

...

SKOS mapping properties are intended to express matching (exact or fuzzy) of concepts from one concept scheme to another, and by convention are used only to connect concepts from different schemes. The concepts relatedMatch, broadMatch, and narrowMatch are a convenience, with the same meaning as the semantic properties related, broader, and narrower. (See previous section regarding the meanings of broader and narrower.)

So is it:

Dog broadMatch Animal # Dog has a broader concept called Animal
Animal narrowMatch Dog # Animal has a narrower concept called Dog

Or the other way around?

danstowell commented 2 years ago

Ah I see. So yes, you have stated it the correct way round.

peterdesmet commented 2 years ago

Aha, all the narrow/broad matches in Camtrap DP are incorrect and should be reversed, including those mentioned in https://github.com/tdwg/camtrap-dp/issues/191#issuecomment-1022357625

edwbaker commented 2 years ago

For information, at the most recent Audubon Core meeting we decided something we should focus on is dealing in with collections of media items (e.g. a single camera trap deployment would fit this) and also ordering of media items within such a collection. We have scheduled another meeting for 6th April if you'd like to share any thoughts/requirements?

baskaufs commented 2 years ago

OK, I'm going to take up point 3. The Iptc4xmpExt:CVterm has always seemed a bit weird to me and a bit of a kludge. The term isn't really being used in the manner recommended in its definition, the recommendation of adding ac:subjectCategoryVocabulary to help people figure out what the term means isn't guaranteed to work, and even looking up some of the suggested vocabularies is difficult (some only have a search box and no easily accessible list of terms, one seems to only be available in German, etc.).

It seems to me that what might be good is to just fix this on the Audubon Core side. At the time the vocabulary was originally developed, nobody in TDWG had figured out how to actually describe a controlled vocabulary in a standardized way. We now know how to do that. What makes sense to me is to just go to all of the recommended vocabularies, try to get a dump of all of their terms, and create a single list somewhere in the Audubon Core space to which people could refer. I'm not sure this would need to be adopted as an official TDWG controlled vocabulary, but it would serve more as an official "lookup dictionary" for the recommended vocabularies.

As part of that process, I would be interested to know how many of the controlled strings from the different vocabularies actually overlap. If there are none or few, then we could just direct people to this list and tell them to use the strings we have there. We could then skip the whole thing of trying to provide (unreliable) information about the controlled vocabularies via the ac:subjectCategoryVocabulary term.

I would be willing to give a shot at trying to grab the data and create such a list. It would be annoying to do, but not that difficult. I probably don't have time to do in in the next week or two, but could probably do it on the time scale of the next few months.

Thoughts @edwbaker ?

baskaufs commented 2 years ago

With respect to point 8, it seems to me that ac:providerManagedID could be considered an exact match in the circumstance where the subject resource is a media item. No TDWG vocabularies have official domain declarations, but I think in the case of Audubon Core, there is a sort of underlying assumption that the subject resource is a media item. There are certainly cases in other vocabularies where a property is appropriated for use with a subject other than the one originally intended. But in this case it would seem somewhat weird to me to use that term with something that isn't a media item.

For observations, it seems like dwc:occurrenceID would be a close match.

I don't think there is anything existing in TDWG currently that would correspond to a deployment ID. It seems like this would be more in the realm of Humboldt Core, but there isn't anything in the most recent list of terms that seems similar. Humboldt Core is really more about describing surveys than about creating metadata terms for the surveys themselves, so that's probably why they haven't gotten granular enough to have a term like deployment ID.

edwbaker commented 2 years ago

@baskaufs - having Iptc4xmpExt:CVterm and ac:subjectCategoryVocabulary does seem a bit less than ideal. It's definitely worth having a look at what we can do.

Possibly we should discuss minting ac:subjectCategory and ac:subjectCategoryLiteral (with only the latter needing ac:subjectCategoryVocabulary)? It looks like the ac usage of CVterm might push the limits of how IPTC define it?

It would be useful for me to get insights into the original ac thinking, if that's still available.

baskaufs commented 2 years ago

Yes, I can try to dig through the past history to see if there was any discussion on that topic. I remember when the discussion/review was going on, I think I asked why they didn't just use dc:/dcterms:subject, which seemed more well-known. But there was some reason for not doing that and I don't remember what it was. It's possible that might be an option over minting new terms in the ac: namespace.

There was a lot of the discussion that was archived in the old Google Code site. I will rummage around and see if the issues tracker there is still viewable. The discussion probably would be there.

baskaufs commented 2 years ago

Here's the tracker: https://code.google.com/archive/p/auduboncore/issues I see immediately that there was some discussion about proper use of the term: https://code.google.com/archive/p/auduboncore/issues/88 but it was after ratification, I think.

baskaufs commented 2 years ago

Hmmm. Well, I've looked through the tracker and also searched my emails from 2011 to 2013 while I was review manager and I can't find anything by a visual scan of subject lines or by searching the text of the emails for "dcterms:subject". So I may just be misremembering or it may have been part of a discussion pre-ratification when they were actually building AC. The Google Code issues don't go back into that era, so this may just be lost information.

The only thing else I can think of is to ask Gregor Hagerdorn or Annette Olson if they remember anything about why dcterms:subject wasn't used. Neither is actively involved in TDWG any more, but they were probably the most active of the Task Group members doing the actual editing. Vijay Barve was also on the task group and is active in AC now, so we could also ask him if he remembers anything about this.

edwbaker commented 2 years ago

Thanks for checking. I think for now I should create a new issue on the ac repository as something to discuss at the next ac meeting?

ben-norton commented 2 years ago

I'm not sure this would need to be adopted as an official TDWG controlled vocabulary, but it would serve more as an official "lookup dictionary" for the recommended vocabularies.

As part of that process, I would be interested to know how many of the controlled strings from the different vocabularies actually overlap. If there are none or few, then we could just direct people to this list and tell them to use the strings we have there. We could then skip the whole thing of trying to provide (unreliable) information about the controlled vocabularies via the ac:subjectCategoryVocabulary term.

I would be willing to give a shot at trying to grab the data and create such a list. It would be annoying to do, but not that difficult. I probably don't have time to do in in the next week or two, but could probably do it on the time scale of the next few months.

@baskaufs

I'm actively working on a similar activity with the exact same question in a different context. I'd be happy to assist if there's a fit.

baskaufs commented 2 years ago

Thanks @ben-norton ! I was thinking that I'd just try to download the most machine-readable version of each of the vocabularies and munge them into a CSV that could be used as-is and also be transformed into JSON-LD. If you have a less labor-intensive idea on how to do it, I'd love to talk about it.

peterdesmet commented 2 years ago

Thanks @baskaufs! Regarding observationType (point 3) it seems that for now we just sit tight and wait until this is resolved at AC. Good that an issue will be created there, because this issue is close to being closed.

Regarding _id (point 8), I'll do an exactMatch with https://ac.tdwg.org/termlist/#ac_providerManagedID. I won't map _id in deployments or observations, since it is (in all three cases) a:

Unique identifier of the [deployment/observation/media file] as assigned by the data management system.

ac:providerManagedID fits that definition, but there are no equivalents for deployments and observations. Note that the "public" observationID and deploymentID are already mapped to dwc:occurrenceID and dwc:eventID respectively.

peterdesmet commented 2 years ago

AC links implemented in #198. More (like observationType) can be added in the future.