tdwg / dwc-for-biologging

Darwin Core recommendations for biologging data
Creative Commons Attribution 4.0 International
13 stars 3 forks source link

Express camtrap-dp as occ + multimedia, rather than event + occ #35

Closed peterdesmet closed 3 years ago

peterdesmet commented 3 years ago

I first mapped camtrap data to Darwin Core as an Event core (deployment) + Occurrence extension (observations): see here for the output.

But because of the star schema, that doesn't allow to map images to occurrences, except maybe with the extended schema, which is not supported by GBIF. As advised by @timrobertson100 I have now mapped the data as an Occurrence core (observations) + Multimedia extension (images), which allows to include the evidence for the occurrence. The only information we lose is the deployment start/end + deployments that did not generate (animal) observations. But that could always be found in the original data.

peterdesmet commented 3 years ago

Feedback welcome! I'll merge this if no one responds within a week, as discussion can always happen later too.

timrobertson100 commented 3 years ago

Thanks @peterdesmet

As advised by @timrobertson100 I have now mapped the data as an Occurrence core (observations) + Multimedia extension (images), which allows to include the evidence for the occurrence.

My rationale was to focus on the richest common data GBIF could support, while knowing the original dataset always provides the full view. It is also the format used in Publishing Camera Trap Data: A Best Practice Guide

albenson-usgs commented 3 years ago

Realize this is already resolved but why can't you use associatedMedia with Event Core with Occurrence extension?

peterdesmet commented 3 years ago

@albenson-usgs yes, using Event Core + Occurrence extension (with an associatedMedia field) would also be a valid solution. That field would then be a concatenated list of image URLs. That is a bit harder to parse and does not allow to add more metadata for the images. @timrobertson100 any feedback on this?

wardappeltans commented 3 years ago

@peterdesmet why not EventCore + Multimedia extension? And keep all events and include absences.

The Resource Relationship Extension could be a solution too?

peterdesmet commented 3 years ago

@wardappeltans: why not EventCore + Multimedia extension?

Because I would like to link multimedia to the occurrences - which breaks the star schema if there is an event core - without making it too complicated. If anyone has easy solutions to solve this (that GBIF can parse), we can take that approach.

I think there are two main approaches:

  1. Deployment oriented: deployments as events core, with related images and occurrences. This allows to express absence data and would include lots of images without species (less useful for biodiversity portals). It does not easily allow to express the evidence for occurrences (no easy link between occ and images).
  2. Occurrence oriented: occurrences as core, with related images as their evidence. This reduces the dataset to the biological information.

I think it is fine to take lossy approach 2, since approach 1 is already available in the source dataset.

albenson-usgs commented 3 years ago

I like approach 2 as you describe it Peter. That is, after all, what we're here for. The only reason I suggested Event Core with Occurrence Extension and associatedMedia is I've seen Occurrence Core with associatedMedia used well in the NOAA Deep Sea Corals database so thought it might work well here too.

peterdesmet commented 3 years ago

@timrobertson100 how does GBIF process associatedMedia?

Can it be used to link to media URLs, e.g.

https://www.agouti.eu/api/uploads/deployment-images/20191001132016-untitled/05c05f7b-6c38-4a06-a3b0-145de226c8ad/20200211215750-RCNX0052.JPG | https://www.agouti.eu/api/uploads/deployment-images/20191001132016-untitled/05c05f7b-6c38-4a06-a3b0-145de226c8ad/20200211215757-RCNX0103.JPG | etc.

Can it be used to link to IDs in a multimedia extension?

04cf6dfe-d954-4687-89a2-0155abd3f694 | 6c636a01-f14f-47eb-8e47-57e8cdf12a5a | etc.
timrobertson100 commented 3 years ago

Can it be used to link to media URLs

Yes, so you could have event core and do this in occurrence extensions. The downside is you miss image metadata (including the ability to convey a license which is usually the blocker)

Can it be used to link to IDs in a multimedia extension?

No

peterdesmet commented 3 years ago

Any feedback from others what approach to take? What do we value more for camtrap data at GBIF/OBIS? Deployments (including empty ones) or image metadata (including license).

peggynewman commented 3 years ago

If it's that's the choice we have to make then I think the image metadata wins.

It's disappointing to exclude the deployment info. I'm assuming that a frictionless data package would allow us to break the star schema and allow more complex relationships, but we're a while off GBIF/OBIS support for that. So in making this recommendation we'd be explicit about the original dataset providing the full view.

pieterprovoost commented 3 years ago

Maybe this is a stretch, but what about using dcterms:rights in the occurrence extension if the image license is the main blocker?

peterdesmet commented 3 years ago

@pieterprovoost dcterms:rights is generally interpreted as applying to the data, not the images.

Overall, I think the general consensus is that although we don't like losing deployment information, it is still better to express the images (a very important part of camera trap data) in an extension, rather than trying to cram these in associatedMedia.

I have now updated the README with this summary and referenced the camera trap best practices guide.

ssmosstee commented 3 years ago

It sounds to me like the only way we will be able to utilise the multimedia extension with an event core is if it undergoes a similar process as the eMoF to associate to both an occurrence and an event.

Another question that comes up is whether the star schema is capable of having more than two extension files relate back to a core. Ideally, I would like to represent my data with a combination of Event Core and Occurrence, eMoF, and Multimedia extensions. Can the multimedia fields be contained within the eMoF extension to circumvent this blocker?

peterdesmet commented 3 years ago

@ssmosstee on your first point: you are right, an eMoF like multimedia extension would allow us to have an event core. Now having published camtrap data to GBIF: https://doi.org/10.15468/5tb6ze, I'm not sure we need an Event core. By adding an eventID to the occurrence core, GBIF will group those, so the deployments are retained. The only information we can't express is the deployment start/end, but that information would be overwritten by the more precise occurrence eventDate anyway when flattening event + occurrences.

On two: yes, you relate many extension files to your core. Checklists are a good example of that: a taxon core + vernacular name extension, distribution extension, ... all linking back to your taxon core. For your use case: if you only want to relate the MoF and multimedia to occurrences, you're better off with an Occurrence core. If you want to relate to events and occurrences, it does get complicated.