tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
205 stars 70 forks source link

Change term - relationshipOfResource #194

Closed peterdesmet closed 2 years ago

peterdesmet commented 6 years ago

Change term

Current Term definition: https://dwc.tdwg.org/terms/#dwc:relationshipOfResource

Proposed new attributes of the term:

Original comment:

The problem

The definition of relationshipOfResource is:

The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary.

But the intention was always to go from (@tucotuco concurs):

resourceID (subject A) to relatedResourceID (object B)

Which is logical. The definition however goes in the other direction:

The relationship of the resource identified by relatedResourceID (B) to the subject (A) (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary.

That makes it very unintuitive, and the listed examples don't help (because e.g. host to could be interpreted in either direction). See for example how we struggled with it here: https://github.com/trias-project/uredinales-belgium-checklist/issues/8#issue-302633347:

The resource relationship between rust fungi A and host B could be expressed as:

id resourceID relatedResourceID relationshipOfResource relationshipAccordingTo
A (A) B host plant of = bibliographicReference or source
A B A parasite of = bibliographicReference or source

It is unfortunate that the most logical relationship is host plant of, which is less specific than parasite of but that is because the definition for relationshipOfResource.

Updating the definition

Can we still amend the definition please? E.g to:

The relationship of the subject (optionally identified by resourceID) to the object (identified by relatedResourceID). Recommended best practice is to use a controlled vocabulary.

Breaking things

Obviously, people have probably done there best trying to understand the definition and been using it. Changing the direction of the relationship can break things. But:

I'd rather have this fixed now, assuming adoption is still low.

peterdesmet commented 6 years ago

Any input on how to move this forward?

tucotuco commented 4 years ago

I believe this to be a serious lingering problem. I think there is consensus that the correct way to make the relationships is as proposed. The change has definite semantic implications, so it can't just be done as an erratum despite what the original intention of the term definition was. We have three criteria that need to be satisfied for a term change to go forward. The first one is demand. We need to get multiple independent parties saying this change needs to be made. The second one the efficacy requirement. That should be easy to satisfy - the proposed change would actually make sense and be clearer than what we have now. The third one is the stability requirement, which is why we need to know what is already published, how the term was used in these cases, and whether those affected would be willing to make the adjustment once ratified. Maybe the first and third requirements can be solved with one set of communications.

We need someone to volunteer to take this forward.

debpaul commented 4 years ago

I think this form is used at iDigBio when providers change identifiers. This has always confused me. Great to see (in a sense) that I was / am not alone. Quentin Groom nicely illustrated this in the paper on establishment means, etc. https://biss.pensoft.net/lib/ajax_srv/generate_pdf.php?document_id=38084&readonly_preview=1&file_id=0

I’m not at all sure how we will be able to a) locate all who are using the extension, or b) check to see if their current implementation is as intended or backwards.

For a), we need this in general — for many reasons. At any aggregator — we (users, data providers, developers), need an easy way to see / find all who are using extensions. The UI needs to show this plainly. You would then have easy access to who is using and how.

Sent from Shoe (my iPhone)

peterdesmet commented 4 years ago

I asked @timrobertson100 to give a breakdown of number GBIF-mediated datasets using the resourceRelationshipExtension per dataset type (occ/checklist) and publisher, so we have an idea of the current use.

peterdesmet commented 4 years ago

Also, it would be great if this issue is tackled in the same swoop as #186.

jhpoelen commented 4 years ago

To me, the proposed definition change is not semantically different. The original definition is just worded in a funny way.

My attempt to parse the original definition:

The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary.

the object: relatedResourceID

the verb: the relationship

the (optional) subject: resourceID

to me the phrase identified by relatedResourceID to the subject indicates that the relatedResourceID is not the subject, but the thing that relates to the subject (i.e. the object).

So, as far as I can tell, the proposed updated definition makes the existing one more clear.

jhpoelen commented 4 years ago

Also, please suggest to include the identifier for the relationship (or relationship id) see #186 in the example.

For instance, for parasite of add a relationship id http://purl.obolibrary.org/obo/RO_0002444 .

jhpoelen commented 4 years ago

In addition, GloBI is indexing many collections that use the resource relationship extension. Also, I am aware that most EMu users (e.g., Field Museum, Smithsonian, Yale Peabody) are working with their vendor to put their relations into the resource relationship. If needed, I can provide more context, or perhaps Kate W. @magpiedin can chime in.

magpiedin commented 4 years ago

👍 & agreed on the above/clarifying the definition

If usage-details from the Field Museum side help: We're adding Resource Relationship extensions to a few of the FMNH IPT resources (e.g., Insects dataset). Data-entry is shifting to this interim workflow (which lines up with @peterdesmet 's clarification above) while Axiell works on development. (& Axiell's been pretty responsive on accommodating this standard in EMu -- @fmjjones & @rondlg for details.)

As a side-note, a lot of the currently published relationships in FMNH data are also specimen-to-taxon, which might look odd without the 'scientificName' field to map to in our current IPT version's RR extension. (unless i missed a step? for now we included taxa in relationshipRemarks, formatted as "[other remarks] | scientificName: [taxon]")

tucotuco commented 3 years ago

Change term

Proposed new attributes of the term:

baskaufs commented 3 years ago

As noted in the comments for the dwc:relationshipOfResourceID proposal, the RDF guide did not attempt to model ResourceRelationship data, so no implications here for dwciri: terms.

MattBlissett commented 3 years ago

I asked @timrobertson100 to give a breakdown of number GBIF-mediated datasets using the resourceRelationshipExtension per dataset type (occ/checklist) and publisher, so we have an idea of the current use.

Part of this is now easy to answer, as the GBIF occurrence search index now includes information on available extensions.

https://www.gbif.org/occurrence/charts?advanced=1&dwca_extension=http:~2F~2Frs.tdwg.org~2Fdwc~2Fterms~2FResourceRelationship

curl -Ss 'https://api.gbif.org/v1/occurrence/search?dwca_extension=http://rs.tdwg.org/dwc/terms/ResourceRelationship&limit=0&facet=datasetKey&facetLimit=10000' | jq '.facets[].counts | length'
228 datasets

curl -Ss 'https://api.gbif.org/v1/occurrence/search?dwca_extension=http://rs.tdwg.org/dwc/terms/ResourceRelationship&limit=0&facet=publishingOrg&facetLimit=10000' | jq '.facets[].counts | length'
31 publishing organizations

The dataset type isn't recorded in the occurrence API, so:

for d in $(curl -Ss 'https://api.gbif.org/v1/occurrence/search?dwca_extension=http://rs.tdwg.org/dwc/terms/ResourceRelationship&limit=0&facet=datasetKey&facetLimit=10000' | jq -r '.facets[].counts[].name'); do curl -Ss https://api.gbif.org/v1/dataset/$d | jq -r '.type'; done | sort | uniq -c
    228 OCCURRENCE

These are all occurrence datasets.

hollyel commented 3 years ago

It would be helpful to include a paleo specific example (e.g., a relationship that defines fossil specimens preserved on the same slab) to encourage adoption of this extension by the paleo collections community. Our understanding is that changes to examples could be reviewed and added with relative ease after this review process. Assuming that is true, we would like to discuss within our working group and come up with a gold star example to add later. - Holly Little, Erica Krimmel (@ekrimmel) and Talia Karim (@tkarim) on behalf of the Paleo Data Working Group)

jhpoelen commented 3 years ago

@hollyel I am glad to see a pro-active stance from y'all paleo folks. @zedomel and Maarten Treekels (Meise Botanical Garden) are putting together a TDWG 2021 Symposium on Interactions (running title: eat or be eaten: don't miss out on interaction data). I hope that you can with with them to share your use cases as examples during the symposium to facilitate discussion on how the paleo community can use the ResourceRelations extension.

Question - would you consider paleo samples on same plate equivalent to saying that they co-occurred or were ecological related ?

Also, if you already have specific examples of paleo interactions in existing collections (preferably your own), I'd be happy to help index these via GloBI to raise awareness. Right now, the only GloBI indexed paleo dataset that I am aware of is a dinosaur diet study that @KatjaSchulz transcribed from literature (see attached screenshots from https://www.globalbioticinteractions.org/?accordingTo=globi%3AKatjaSchulz%2Fdinosaur-biotic-interactions&interactionType=ecologicallyRelatedTo and https://www.globalbioticinteractions.org/browse?accordingTo=globi%3AKatjaSchulz%2Fdinosaur-biotic-interactions&interactionType=ecologicallyRelatedTo)

Also see related change request #186 (older discussion), https://github.com/tdwg/dwc/issues/283 (formal proposal), and https://github.com/tdwg/interaction/issues/23 .

Screenshot from 2021-05-28 08-10-36 Screenshot from 2021-05-28 08-10-08

hollyel commented 3 years ago

Thanks @jhpoelen. It would be great to discuss the application of the ResourceRelationship extension to paleo data in more detail. I’ll keep an eye out for that symposium. I can’t point to an existing documented example within the dataset I manage (NMNH Paleo - I don’t think we share much of if any of that data at the moment), but I will work on pulling examples that cover some of our primary use cases. For the most part I would say yes to your question. I’m sure there are some exceptions though. We sometimes use the term ‘community slab’ to refer to these types of specimens. There can of course be additional layers of interaction as well (e.g. fossil evidence of insect/plant interactions).

tkarim commented 3 years ago

@jhpoelen getting at your question about fossil specimens associated on a slab... it depends. :) Sometimes you clearly have an ecological assemblage that was rapidly buried preserving an occurrence of a group of organisms (e.g. slabs of molting trilobites). You can usually identify this by the preservation and also the nature of the overlying sediments. In other cases you have time averaged accumulations of individuals (that are often disarticulated), which can be identified again by the level of disarticulation and the nature of the sediment and the amount of reworking that is shown. Another important example here for fossils that hasn't been mentioned are lumps of amber that have multiple inclusions (e.g. you could have insects, plants, and air bubbles all in one piece of amber). The lumps are typically sliced up for imaging and/or other sampling, but the original ecological association of the various inclusions is very important to have recorded somewhere.

EstebanMH-SiB commented 3 years ago

We endorse this proposal on behalf of @SiBColombia

tkarim commented 3 years ago

@jhpoelen also... in response to your question about interactions... we have TONS in paleo collections that we are just starting to document. A lot of encrusters (e.g. bryozoans on bivalves), mollusks with bore holes, insect feeding traces on fossil plants, insect galls on fossil plants. This might make a good Paleo Data Happy Hour session for the fall.

jhpoelen commented 3 years ago

@tkarim looking forward to seeing all the rich paleo interaction data (🦖 eats 🦕?) become easier to access. Please let me know how I can help and/or you have (prototype) examples of paleo records that recorded associations in existing DwC-A.

tucotuco commented 3 years ago

It would be helpful to include a paleo specific example (e.g., a relationship that defines fossil specimens preserved on the same slab) to encourage adoption of this extension by the paleo collections community. Our understanding is that changes to examples could be reviewed and added with relative ease after this review process. Assuming that is true, we would like to discuss within our working group and come up with a gold star example to add later. - Holly Little, Erica Krimmel (@ekrimmel) and Talia Karim (@tkarim) on behalf of the Paleo Data Working Group)

@hollyel @ekrimmel @tkarim If you come up with the example before the Maintenance Group makes the next release (work beginning on that now), we can include it without having to make a term change for this in the release after that. I anticipate that there is a window of about 30 days for this.

hollyel commented 3 years ago

Thanks @tucotuco. We will work on an example and aim to get that to you in a couple of weeks. In addition to the example, additional vocabulary may need to be developed to cover the paleo use cases we are working on and to encourage consistent adoption. That will of course take time, but we want to evaluate some of that to ensure the example aligns with that ongoing vocabulary development.

zedomel commented 3 years ago

@jhpoelen thanks for remembering the Biological Interaction Data symposium.

@hollyel, you are most welcome to join us, especially in the discussions in the Interest Group https://github.com/tdwg/interaction.

A paleo use case will be useful to validate the solution we are developing, so the paleo community can benefit from a common vocabulary for sharing species interactions.

tucotuco commented 3 years ago

Thanks @tucotuco. We will work on an example and aim to get that to you in a couple of weeks. In addition to the example, additional vocabulary may need to be developed to cover the paleo use cases we are working on and to encourage consistent adoption. That will of course take time, but we want to evaluate some of that to ensure the example aligns with that ongoing vocabulary development.

@hollyel Do you have the additional example? We are otherwise ready to move forward on ratifying the terms from the latest public review.

hollyel commented 3 years ago

@tucotuco Yes! Apologies for the delay. We have reviewed with the Paleo Data Working Group and propose including "on slab with" in the example list. An example of this relationship: "a1be8a3c-090f-11e3-af8d-50faf7e7a06b on slab with a1beaa2a-090f-11e3-af8d-50faf7e7a06b". I included the links for easy access to the records and the associated images.

@zedomel Thanks! We are definitely interested in exploring the possible vocabulary needs for paleo more and have a few more examples we are working through as a group. I'm sure we will be reaching out :)

tucotuco commented 3 years ago

Incorporated, thanks.

gdadade commented 3 years ago

This term is used in GGBN regularly and we agreed on a vocabulary within GGBN, but I think we need to review it to ensure it can be used in a broader context. Currently we use e.g. "same individual", "same population" since the voucher can be either the same organism as the tissue or just from the same population. I hope we can add examples from our community for broader usage in the near future.

deepreef commented 3 years ago

Currently we use e.g. "same individual", "same population" since the voucher can be either the same organism as the tissue or just from the same population.

Note that an instance of Organism can represent either a single individual organism, or an entire population of organisms. I'm not sure if that addresses the issue you raise, but because the scope of an instance of Organism can be scaled up to (and beyond) a population (as long as it's taxonomically homogeneous), then it seems like there should be a a way to accommodate both "same individual" and "same population" using appropriately-scaled instances of Organism.

tucotuco commented 3 years ago

Done.

ekrimmel commented 2 years ago

@tucotuco I notice that the example phrase "on slab with" didn't make it into the official term documentation (looking here), perhaps because we provided it last minute to you. Is it possible to get that included now, given that it is included in this issue and this issue has been reviewed and passed? Thanks!

@hollyel @tkarim

tucotuco commented 2 years ago

Neglected example added in new Issue #400. Closing this issue.