Closed peterdesmet closed 2 years ago
Note that the relationships described below are item -> relatedItem, while the skos matches are actually relatedItem -> item. This was fixed in https://github.com/tdwg/camtrap-dp/commit/48b7243e7abff54ac5a50e87e0e58e099575637a
I went over all the AC terms to see which ones could be linked to from Camtrap DP terms. Below are my conclusions/remarks. I'll update the icon with ✅ or ❌ once there is feedback whether it should be respectively adopted or not. Use the numbers to refer to a specific item.
@tdwg/ac, @ edwbaker, @baskaufs your feedback would be much appreciated! Even if we can already check off some of the easier ones.
❌ Ignored borrowed dwc
terms, since those are already included in Camtrap DP where relevant.
✅ rightsHolder
links to http://purl.org/dc/terms/rightsHolder. I am not going to add or replace this with AC's suggested http://ns.adobe.com/xap/1.0/rights/Owner I think. xmpRights:Owner technically better since literal, but does not cover orgs managing rights.
☑️ observationType
in a way describes what a (sequence of) images contains, and could be seen as a Subject Category with its own vocabulary. Values are strings. According to the definition:
as long as all unqualified terms in all vocabularies are unique, metadata SHOULD provide the source vocabularies using the Subject Category Vocabulary term
Which I guess means we should then also add a Subject Category Vocabulary term? That is a bit of a loop, because observationType
is both a Subject Category (when used in data) and a Subject Category Vocabulary (because the specs have an enumeration of allowed values). Not sure what the best approach is here.
✅ tags
could be seen as a ac:tag. However, Camtrap DP tags relate to the camera deployment so tags could be winter 2020
, while AC tag likely refer to media files? Maybe a skos:broadMatch
? yes, broadMatch
✅ mediaID
is maybe a skos:narrowMatch
of http://purl.org/dc/terms/identifier
? But so are the other ID fields, which already have more specific dwc
IDs. Worth adding? narrowmatch with identifier in media, don't add to other fields
❌ favourite
- TRUE if media is of interest - doesn't really match with rating in my opinion, because that one specifically requires a 1 to 5 (star rating) range. I don't see much advantage adopting a 1-5 rating for our use case.
✅ comments
- Comments or notes about the media file. - does seem to match with ac:comments, but the latter one is probably more broadly applicable than comments about a media file, so maybe we have to opt for skos:narrowMatch
here? As with ID
fields, deployment and observation comments already use more specific dwc
remark fields. exactMatch with ac:comments, don't add to other fields
☑️ _id
- occurs for deployments, media and observations with a definition like "Unique identifier of the x as assigned by the data management system." Can it be considered a ac:providerManagedID? Is this then a skos:exactMatch
or skos:narrowMatch
?
✅ timestamp
- Date and time when the media file was recorded- is likely a skos:exactMatch
for xmp:CreateDate.
❌ taxonomic
gives an overview of all taxa in a dataset. This is not the same as ac:taxonCoverage which gives a single taxon name at a higher level that encompasses all taxa seen in an image.
✅ cameramodel
is likely an skos:exactMatch
of ac:captureDevice. We are just a bit more strict with how it should be written (make + model), so maybe it is skos:narrowMatch
? yes, narrowMatch
❌ locationName
already links to http://rs.tdwg.org/dwc/terms/locality, but could also be seen as Location Created, but it won't be directly associated with the media files (it's in deployment) and is always the location depicted, so there is no confusion there. Would not add this.
✅ captureMethod
- values motion detection
and time lapse
- is likely a skos:narrowMatch
of ac:resourceCreationTechnique. The latter is free text (no vocabulary), so I consider this a narrowMatch
.
❌ An observation in Camtrap DP is associated with a media file, not the other way around, so we can't use ac:associatedObservationReference.
✅ filePath
is a URL or local path to a (representation, not specified) of a media file. Is this a ac:accessURI or ac:ServiceAccessPoint? I don't really understand these concepts and whether they accept local paths too? And do we have to provide the ac:hasServiceAccessPoint then too somewhere? narrowMatch of ac:accessURI, allows online/offline resources
❌ The AC Region of Interest Vocabulary terms might come in handy if Camtrap DP is extended to support indicating where e.g. an animal was observed in an image, but not right now.
❌ Can an observation based on a (sequence of) image(s) could be considered a region of interest? In addition, I'm confused by the terms ac:hasROI and ac:isROIOf. It looks like the definitions got switched. The first must be a media file, the second a ROI, but the second says "The media item within which a region of interest is located.". Don't consider observations as a ROI of a (sequence of images)
❌ Observaton timestamp
already has http://rs.tdwg.org/dwc/terms/eventDate, but also says "for sequence-based observations the timestamp of the first media file in the associated sequence (in sequenceID)." so it could be seen as ac:startTimestamp, but I'm not convinced it adds much?
Peter's crosslink suggestions seem good to me. Some quick responses:
I would accept with all the "ticked boxes" that I haven't mentioned.
@danstowell Thank you very much for the review!
-7. Let me split this up in two questions:
ac:comments
to media comments and is that an exactMatch or narrowMatch (depends on how broad ac:comments is)?ac:comments
to deployment and observation comments when these do already link out to respectively http://rs.tdwg.org/dwc/terms/eventRemarks and http://rs.tdwg.org/dwc/terms/occurrenceRemarks. I'm tempted to say no, and keep only the more specific dwc term, just like I don't see the point to linking deployment and occurrence ID to http://purl.org/dc/terms/identifier
when there is already a more specific dwc
term added.-17. Right, the value of those terms ac:isROIOf
/ac:hasROI
would contain the object, not the subject. Those definitions always confuse me. 😅
-17. I would then add ac:isROIOf
to two fields in observations: sequenceID
and mediaID
because both levels can be used to base observations on.
-18. I think I'll keep it to the more relevant relation eventDate
. Observations based on a sequence don't even indicate the timestamp of when the animal was seen in the sequence, just the sequence start date.
Other points: I'll close the ones where there is a clear suggestion made by me.
mediaID
then. Add it http://purl.org/dc/terms/identifier
as a narrowMatch for mediaID
, but don't add it to deploymentID
and observationID
which already have more specific identifiers.If I consider 17 answered (correct me if my suggestion for it in https://github.com/tdwg/camtrap-dp/issues/191#issuecomment-1022357625 is incorrect), then there are only 2 questions left: observationtype
and internal _id
. See ☑️ in https://github.com/tdwg/camtrap-dp/issues/191#issuecomment-1022173750 for more details.
I think the issue with dcterms:rightsHolder
that may have caused AC to prefer xmpRights:Owner
is that the Dublin Core term should properly be used with a URI and not a string value. My guess is that a lot of people are using dcterms:rightsHolder
(incorrectly) with strings anyway, so that may be a moot point. See section 3.3 and footnote 6 of the Darwin Core RDF Guide for more details.
I've read through the mappings and the subsequent comments up to this one and I think it all makes sense to me except for one thing. I am confused about how you are interpreting regions of interest in this statement:
I would then add ac:isROIOf to two fields in observations: sequenceID and mediaID because both levels can be used to base observations on.
I would say that the concept of regions of interest is this: a media item can contain one or more regions of interest that are defined by their spacial or temporal extent. You'd link the media item to its ROI by ac:hasROI
and you'd link the ROI to the media item with ac:isROIOf
. But you would not link an occurrence to a ROI or vice versa with either ac:hasROI
or ac:isROIOf
.
If you wanted to link an occurrence to a ROI, you'd need the (not yet minted) dwc:evidence
(or dwc:hasEvidence
) and make a statement like this:
occurrence dwc:hasEvidence ROI.
I would not say:
occurrence ac:hasROI ROI
if that was what you meant because occurrences don't have regions of interest, media items do.
To link the other way you'd use something like ac:associatedObservationReference
(if the occurrence was considered an observation):
ROI ac:associatedObservationReference occurrence.
There is an example of this in the Recipes document. It's a real example where the fruit part of the image of the plant was used to indicate the existence of an occurrence in GBIF.
I could see that if you were considering a sequence of images to be a media item in its own right, then I suppose one of the component images could be considered an ROI of it because it is a part of the media item defined by temporal extent, the same way a frame of a video could be a temporal ROI of the whole video. But I'm not sure that's what you were saying.
I may also just be totally misunderstanding what you intended.
Thanks @baskaufs!
Regarding rightsHolder
, I'll see if I'll update that. So far it was useful we could basically copy/paste the definition and link to a term people know, but I see your point.
Regarding regions of interest: you read it correctly. occurrence dwc:hasEvidence ROI
would be more apt, and I won't consider occurrences ROIs then. I won't use ac:associatedObservationReference
because we don't make the link in that direction (image -> observation, only observation -> (sequence of) images)?
Two ☑️ points (3. and 8.) remain in https://github.com/tdwg/camtrap-dp/issues/191#issuecomment-1022173750, suggestions welcome (it's regarding Subject category
and providerManagedID
)
Question about narrowMatch
and broadMatch
. From Wikipedia (emphasis mine):
The property related simply makes an association relationship between two concepts; no hierarchy or generality relation is implied. The properties broader and narrower are used to assert a direct hierarchical link between two concepts. The meaning may be unexpected; the relation A broader B means that A has a broader concept called B—hence that B is broader than A. Narrower follows in the same pattern.
...
SKOS mapping properties are intended to express matching (exact or fuzzy) of concepts from one concept scheme to another, and by convention are used only to connect concepts from different schemes. The concepts relatedMatch, broadMatch, and narrowMatch are a convenience, with the same meaning as the semantic properties related, broader, and narrower. (See previous section regarding the meanings of broader and narrower.)
So is it:
Dog broadMatch Animal # Dog has a broader concept called Animal
Animal narrowMatch Dog # Animal has a narrower concept called Dog
Or the other way around?
Ah I see. So yes, you have stated it the correct way round.
Aha, all the narrow/broad matches in Camtrap DP are incorrect and should be reversed, including those mentioned in https://github.com/tdwg/camtrap-dp/issues/191#issuecomment-1022357625
For information, at the most recent Audubon Core meeting we decided something we should focus on is dealing in with collections of media items (e.g. a single camera trap deployment would fit this) and also ordering of media items within such a collection. We have scheduled another meeting for 6th April if you'd like to share any thoughts/requirements?
OK, I'm going to take up point 3. The Iptc4xmpExt:CVterm
has always seemed a bit weird to me and a bit of a kludge. The term isn't really being used in the manner recommended in its definition, the recommendation of adding ac:subjectCategoryVocabulary
to help people figure out what the term means isn't guaranteed to work, and even looking up some of the suggested vocabularies is difficult (some only have a search box and no easily accessible list of terms, one seems to only be available in German, etc.).
It seems to me that what might be good is to just fix this on the Audubon Core side. At the time the vocabulary was originally developed, nobody in TDWG had figured out how to actually describe a controlled vocabulary in a standardized way. We now know how to do that. What makes sense to me is to just go to all of the recommended vocabularies, try to get a dump of all of their terms, and create a single list somewhere in the Audubon Core space to which people could refer. I'm not sure this would need to be adopted as an official TDWG controlled vocabulary, but it would serve more as an official "lookup dictionary" for the recommended vocabularies.
As part of that process, I would be interested to know how many of the controlled strings from the different vocabularies actually overlap. If there are none or few, then we could just direct people to this list and tell them to use the strings we have there. We could then skip the whole thing of trying to provide (unreliable) information about the controlled vocabularies via the ac:subjectCategoryVocabulary
term.
I would be willing to give a shot at trying to grab the data and create such a list. It would be annoying to do, but not that difficult. I probably don't have time to do in in the next week or two, but could probably do it on the time scale of the next few months.
Thoughts @edwbaker ?
With respect to point 8, it seems to me that ac:providerManagedID
could be considered an exact match in the circumstance where the subject resource is a media item. No TDWG vocabularies have official domain declarations, but I think in the case of Audubon Core, there is a sort of underlying assumption that the subject resource is a media item. There are certainly cases in other vocabularies where a property is appropriated for use with a subject other than the one originally intended. But in this case it would seem somewhat weird to me to use that term with something that isn't a media item.
For observations, it seems like dwc:occurrenceID
would be a close match.
I don't think there is anything existing in TDWG currently that would correspond to a deployment ID. It seems like this would be more in the realm of Humboldt Core, but there isn't anything in the most recent list of terms that seems similar. Humboldt Core is really more about describing surveys than about creating metadata terms for the surveys themselves, so that's probably why they haven't gotten granular enough to have a term like deployment ID.
@baskaufs - having Iptc4xmpExt:CVterm
and ac:subjectCategoryVocabulary
does seem a bit less than ideal. It's definitely worth having a look at what we can do.
Possibly we should discuss minting ac:subjectCategory
and ac:subjectCategoryLiteral
(with only the latter needing ac:subjectCategoryVocabulary
)? It looks like the ac usage of CVterm
might push the limits of how IPTC define it?
It would be useful for me to get insights into the original ac thinking, if that's still available.
Yes, I can try to dig through the past history to see if there was any discussion on that topic. I remember when the discussion/review was going on, I think I asked why they didn't just use dc:/dcterms:subject
, which seemed more well-known. But there was some reason for not doing that and I don't remember what it was. It's possible that might be an option over minting new terms in the ac: namespace.
There was a lot of the discussion that was archived in the old Google Code site. I will rummage around and see if the issues tracker there is still viewable. The discussion probably would be there.
Here's the tracker: https://code.google.com/archive/p/auduboncore/issues I see immediately that there was some discussion about proper use of the term: https://code.google.com/archive/p/auduboncore/issues/88 but it was after ratification, I think.
Hmmm. Well, I've looked through the tracker and also searched my emails from 2011 to 2013 while I was review manager and I can't find anything by a visual scan of subject lines or by searching the text of the emails for "dcterms:subject". So I may just be misremembering or it may have been part of a discussion pre-ratification when they were actually building AC. The Google Code issues don't go back into that era, so this may just be lost information.
The only thing else I can think of is to ask Gregor Hagerdorn or Annette Olson if they remember anything about why dcterms:subject wasn't used. Neither is actively involved in TDWG any more, but they were probably the most active of the Task Group members doing the actual editing. Vijay Barve was also on the task group and is active in AC now, so we could also ask him if he remembers anything about this.
Thanks for checking. I think for now I should create a new issue on the ac repository as something to discuss at the next ac meeting?
I'm not sure this would need to be adopted as an official TDWG controlled vocabulary, but it would serve more as an official "lookup dictionary" for the recommended vocabularies.
As part of that process, I would be interested to know how many of the controlled strings from the different vocabularies actually overlap. If there are none or few, then we could just direct people to this list and tell them to use the strings we have there. We could then skip the whole thing of trying to provide (unreliable) information about the controlled vocabularies via the
ac:subjectCategoryVocabulary
term.I would be willing to give a shot at trying to grab the data and create such a list. It would be annoying to do, but not that difficult. I probably don't have time to do in in the next week or two, but could probably do it on the time scale of the next few months.
@baskaufs
I'm actively working on a similar activity with the exact same question in a different context. I'd be happy to assist if there's a fit.
Thanks @ben-norton ! I was thinking that I'd just try to download the most machine-readable version of each of the vocabularies and munge them into a CSV that could be used as-is and also be transformed into JSON-LD. If you have a less labor-intensive idea on how to do it, I'd love to talk about it.
Thanks @baskaufs! Regarding observationType
(point 3) it seems that for now we just sit tight and wait until this is resolved at AC. Good that an issue will be created there, because this issue is close to being closed.
Regarding _id
(point 8), I'll do an exactMatch with https://ac.tdwg.org/termlist/#ac_providerManagedID. I won't map _id
in deployments or observations, since it is (in all three cases) a:
Unique identifier of the [deployment/observation/media file] as assigned by the data management system.
ac:providerManagedID
fits that definition, but there are no equivalents for deployments and observations. Note that the "public" observationID
and deploymentID
are already mapped to dwc:occurrenceID
and dwc:eventID
respectively.
AC links implemented in #198. More (like observationType
) can be added in the future.
As we did with Dublin Core and Darwin Core, crosslink to Audubon Core for terms that are equivalent and defined there.