trias-project / uredinales-belgium-checklist

🍃 Catalogue of the Rust Fungi of Belgium
https://trias-project.github.io/uredinales-belgium-checklist
MIT License
0 stars 1 forks source link

Add host/parasite relationships #8

Open peterdesmet opened 6 years ago

peterdesmet commented 6 years ago

The checklist contains interesting parasite/host information in associatedTaxa. There are 3 ways we could add this information:

associatedTaxa

There is a specific field where this info can be expressed in the form of relationship: taxon, but that field is not part of the taxon core.

description

We could treat every relationship as a description in the form of:

That information will appear on GBIF, but there is no semantic information about the description, i.e. it is unknown that these are species.

resourceRelationship

The proper way to express this information is in Resource Relationship, which defines a relationship between two (here) taxa. Even though we only have the scientificName for every host, these should become their own record in the taxon core and will be treated by GBIF as any normal taxon (which is the ideal scenario).

Note that for parsing the associatedTaxa information:

The resource relationship between rust fungi A and host B could be expressed as:

id resourceID relatedResourceID relationshipOfResource relationshipAccordingTo
A (A) B host plant of = bibliographicReference or source
A B A parasite of = bibliographicReference or source

It is unfortunate that the most logical relationship is host plant of, which is less specific than parasite of but that is because the definition for relationshipOfResource is:

The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. see also http://rs.tdwg.org/dwc/terms/index.htm#relationshipOfResource

qgroom commented 6 years ago

I certainly like the idea of using resourceRelationship, particularly as it represents the relationship in both directions. I feel I am asking for a lot of work though. Nevertheless, this is not at all an exceptional dataset and representing interactions in Darwin Core has been a long overlooked subject.

peterdesmet commented 6 years ago

I asked @mdoering: he couldn't name a dataset of the top of his head that includes this extension + he said that extension is not available through the API (unlike description, distribution, etc.). So, doing this for this dataset would make it an examplar dataset.

LienReyserhove commented 6 years ago

taken this into account, I certainly want to spend some time on doing the mapping.

qgroom commented 6 years ago

Last week I was in Bari only talking about species interactions data. Particulatly with @jhpoelen who maintains globalbioticinteractions.org. An examplar dataset would be nice! There will also a workshop and symposium at TDWG 2018 on interactions data.

jhpoelen commented 6 years ago

Some example datasets using various flavors of dwca-ish archives: http://amnh.begoniasociety.org/dwc/rss.xml via https://zenodo.org/record/229577#.WqGrUWaEeis , GloBI exports eol proposed associations (see https://globalbioticinteractions.org/data) . These are of the top off my head, there's definitely more than that.

From my perspective (as a data integrator) the current usage of assocatedTaxa (part of the occurrence row-type/table) looks suitable and should be relatively easy to manage and integration. In fact, would you mind if I include your current checklist in GloBI?

It is a bummer that GBIF chose not to index this valuable field. I was hoping that GloBI would take on this task some day, I (or anyone else) just didn't get to it yet. It actually shouldn't be too hard to do (famous last words).

Wouldn't want to inject myself into a lengthy conversation about how to best represent the data. However, I would like to say that using id/label pair (e.g., http://purl.obolibrary.org/obo/RO_0002444 and parasite of) to describe an interaction would save a lot of integration headaches.

peterdesmet commented 6 years ago

@jhpoelen

  1. assocatedTaxa: would have been the quickest solution, but that term is not available in the taxon core
  2. Sure, but note that this checklist has not been published yet. 😄
  3. Yeah, parasite of is the most logical way to express the relation. Works in resourceRelationship, but is just a bit against the grain.
jhpoelen commented 6 years ago

@peterdesmet I've seen folks use the statements from literature as occurrences with basisOfRecord at literature. Perhaps in this context the usage of associatedTaxa would fare a little better in the dwca crowd. Are you considering to add the source citation in the table also?

I am ok with integrating pre-publication data in GloBI. In fact, I think that it is a good thing to iron out integration issues before publication to avoid the great-idea-but-our-grant-funding-ran-out-and-hoping-to-have-a-look-at-it-later response. Also, I believe that discussions like the one were are having now are benefit from having specific examples using real data.

Yesterday, I forked the dataset and added some meta-data to make the uredinales checklist discoverable via GloBI . I'll create a pull-request shortly.

jhpoelen commented 6 years ago

@peterdesmet @qgroom @LienReyserhove Was just reviewing this thread in context of a related discussion about integrating dwca into GloBI. I am hoping to complete the integration using the more explicit form using http://rs.gbif.org/extension/dwc/resource_relation_2018_01_18.xml and use it as a showcase example in future discussions. Please let me know if you have any questions about my pull request https://github.com/trias-project/uredinales-belgium-checklist/pull/9 .

On a related note, I'd very much like to see an URI pointing to a definition for the relationship types used in the Resource Relation (e.g., http://purl.obolibrary.org/obo/RO_0002557 when using "has pathogen"). What is the proper channel to propose this?

peterdesmet commented 6 years ago

Hi @jhpoelen we'll have a look, but don't know when we'll be able to do it (rushing to prepare for TDWG conference later this month).

Regarding the URI describing the relationship, that should be possible using the dwciri: prefix/namespace. So relationshipOfResource would be parasite of and dwciri:relationshipOfResource would be an IRI to a definition for that. /ping @baskaufs. The extension still needs to be adapted to allow this though.

jhpoelen commented 6 years ago

The dcwiri: approach to point to the definition of a relationship type sounds like a pretty good idea. I imagine that the IRI would be used by machines and the label / literal would mainly be used by humans or for cross-check purposes.

Using your earlier example as a starting point, I imagine it'd look something like:

id resourceID relatedResourceID dwciri:relationshipOfResource relationshipOfResource relationshipAccordingTo
A (A) B http://example.org/some_term_iri host plant of = bibliographicReference or source
A B A http://purl.obolibrary.org/obo/RO_0002444 parasite of = bibliographicReference or source
baskaufs commented 6 years ago

This approach makes sense to me in the context of a spreadsheet. However, in sec 3.8 of the RDF guide we explicitly said that the term dwciri:relationshipOfResource did not exist for reasons given in the notes.

At that time, we were thinking pretty narrowly about how the dwciri: terms would be used in RDF. However, since then, it seems pretty clear to me that the dwciri: terms would also be useful even in spreadsheets any time a URI would provide an unambiguous reference as it does here (even if that doesn't strictly make sense as RDF).

So I think to accomplish what you want to do would require a new term request, since the RDF Guide specifically says the term doesn't exist.

jhpoelen commented 6 years ago

@baskaufs Thanks for providing the context and I can see the trouble with using specific relationships without explicitly specifying to the types of resources they relate to. Rather than blurring the boundary of the rdf and non-rdf worlds, I'd go for introducing a relationshipOfResourceID term, very similar to resourceID and relatedResourceID: the id points to a definition of the resource/relationship type, whereas the human readable labels of relationshipOfResource are primarily used to please humans (not machines). Similarly, I'd go for introducing relationshipAccordingToID for similar reasons.

As I am thinking about the topic of associations between taxa, I do have some more questions about this representation. In it's current form we express something like:

Taxon A is known to eat taxon B according to evidence C.

However, taxa don't eat each other, only occurrences of taxa do. So, we're really saying:

Some occurrence X of Taxon A ate some occurrence Y of taxon B according to evidence C.

In this train of thought, I would expect the relationships in the resource relation table to be between the occurrence X and Y rather than between taxa A and B. This would also make it easier to distinguish properties that related to a taxon like a name, and naming authority from properties that related to a specific occurrence of a taxon like eventDate, location, habitat etc. I'd still have to figure out how to capture the properties of the relationship itself. An example of such a relationship specific property is the method used to determine the relationship (e.g., stomach/poop analysis, text mining of literature).

So, summarizing all this in a table would look something like:

id resourceID relatedResourceID relationshipOfResourceID relationshipOfResource relationshipAccordingToID relationshipAccordingTo relationshipPropertyX relationshipPropertyY ...
A (X) Y http://example.org/some_term_iri host plant of doi:10.123/345 = bibliographicReference or source some value some value ...
A Y X http://purl.obolibrary.org/obo/RO_0002444 parasite of doi:10.123/567 = bibliographicReference or source some value some value ...

Note that I am leaving out the definition of occurrences X, Y in appropriate occurrence tables.

baskaufs commented 6 years ago

The relationshipOfResourceID term seems like a good solution.

jhpoelen commented 6 years ago

Assuming that new terms are proposed through a github issue, I created https://github.com/tdwg/dwc/issues/186 to help things move along.

qgroom commented 6 years ago

I will try to find out how to propose new terms officially . I suspect we need to compose a justification, show examples and propose suitable changes to the documentation. However, it seems to me that we know what this should all be and it should not take too long.

baskaufs commented 6 years ago

I believe that the process is to do what @jhpoelen has already done: open an issue on the Darwin Core tracker. What has typically happened in the past is that term submission proposals have gotten traction when multiple people have indicated support for the proposal. I think the informal (maybe formal?) requirement is that at least two independent agents (persons/organizations) need to say that they need the term.

I was thinking that there was an actual issue template for proposing new issues, but I don't see it right now. @peterdesmet or @tucotuco would know about that.

jhpoelen commented 6 years ago

@baskaufs please let me know if there's anything I can do to push this proposal forward. I feel that many would benefit from the proposed improvements (e.g., @seltmann in https://github.com/ParasiteTracker/vampire-moth-dwca/issues/2 ).

baskaufs commented 6 years ago

Historically, Darwin Core term additions have happened at the speed of a glacier. I think the more interest and discussion about the proposals (https://github.com/tdwg/dwc/issues/186 https://github.com/tdwg/dwc/issues/187), the more likely that something will happen in the near future.

jhpoelen commented 6 years ago

Ok, @peterdesmet @qgroom @baskaufs @baskaufs @tucotuco @LienReyserhove could you please add all the emoticons, comments and thumbs up you can spare to express your unwavering support for term proposals https://github.com/tdwg/dwc/issues/186 https://github.com/tdwg/dwc/issues/187 that followed from our discussion?

I am hoping to actually start using them in GloBI and share your examples as "the" way to capture species interaction records in DwC. Perhaps we can even get the Symbiota users folks (e.g., @seltmann @neilcobb) on board to help push for supporting it in the various tools . . .

neilcobb commented 6 years ago

Happy to help solicit participation by Symbiota users

From: Jorrit Poelen notifications@github.com Sent: Thursday, September 20, 2018 9:36 AM To: trias-project/uredinales-belgium-checklist uredinales-belgium-checklist@noreply.github.com Cc: Neil Stanley Cobb Neil.Cobb@nau.edu; Mention mention@noreply.github.com Subject: Re: [trias-project/uredinales-belgium-checklist] Add host/parasite relationships (#8)

Ok, @peterdesmethttps://github.com/peterdesmet @qgroomhttps://github.com/qgroom @baskaufshttps://github.com/baskaufs @baskaufshttps://github.com/baskaufs @tucotucohttps://github.com/tucotuco @LienReyserhovehttps://github.com/LienReyserhove could you please add all the emoticons, comments and thumbs up you can spare to express your unwavering support for term proposals tdwg/dwc#186https://github.com/tdwg/dwc/issues/186 tdwg/dwc#187https://github.com/tdwg/dwc/issues/187 that followed from our discussion?

I am hoping to actually start using them in GloBI and share your examples as "the" way to capture species interaction records in DwC. Perhaps we can even get the Symbiota users folks (e.g., @seltmannhttps://github.com/seltmann @neilcobbhttps://github.com/neilcobb) on board to help push for supporting it in the various tools . . .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/trias-project/uredinales-belgium-checklist/issues/8#issuecomment-423250161, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFtlKsESyH8CntKDCNEsSSiYoJIMGXEdks5uc8P8gaJpZM4SeaVL.