plazi / arcadia-project

2 stars 1 forks source link

host relationship as custom metadata #143

Open myrmoteras opened 4 years ago

myrmoteras commented 4 years ago

@slint here a question regarding how we could model virus/host relationship in the metadata.

In a publication, there is the virus, there is the host and the form a relations ship "is host of". if there is only one, then the latter can be inferred. However, if you have a table with many virus/host relationship, then upload the host and virus as custom metadata does not work anymore, because the specific relation is lost.

This sounds like the problem we have with the geo coordinates.

Might it thus be possible to create analogues to the geo-coordinates a metadata, that is a host-vrius relations ship in cludding the pair host and virus?

This relationshiop is essential to keep and would be a great assett if we could model like this.

Examples are at the moment pulled togther by Marcus and his team: https://docs.google.com/spreadsheets/d/1Rq57pmrA3V3k_PrKWGG0Y65-ZN9TRR497lmUu9wx0Cc/edit#gid=0 and here check the colums E-H

Let us know, what you think, if this is possible whether this could be done by when so we could use this in the beginning upload of the files to coviho

slint commented 4 years ago

From the discussion we had, I can think of two ways to tackle this:

{
    'metadata': {
        ...
        'custom': {
            'dwc:associatedTaxa': ['"host of":"Severe Acute Respiratory Syndrome-CoV"'],
        }
    }
}

Let me know what you think about these.

mguidoti commented 4 years ago

Hi Alex,

Sorry for my delay to answer this.

I think your suggestion doesn't cover a very common case: when the paper has more than one virus-host relationship being described. If I understood you correctly, we would have x dwc:relationshipOfResource and x amount of dwc:relatedResourceID but no way to query the relationship virus-host, just virus and hosts. This means that we wouldn't be able to query specific relationships, right?

Best,

Em qua., 8 de abr. de 2020 às 15:49, Alex Ioannidis < notifications@github.com> escreveu:

From the discussion we had, I can think of two ways to tackle this:

-

We can use the dwc:relationshipOfResource https://dwc.tdwg.org/terms/#dwc:relationshipOfResource and dwc:relatedResourceID https://dwc.tdwg.org/terms/#dwc:relatedResourceID in the following fashion:

{ 'metadata': { ... 'custom': { 'dwc:relationshipOfResource': ['host to'],

Maybe a canonical identifier for the virus species would be better...

           'dwc:relatedResourceID': ['Severe Acute Respiratory Syndrome-CoV'],
       }
   }

}

  • To search for this metadata one would do a query for the custom fields, including both terms like /search?custom=[dwc:relationshipOfResource]:"host of"&custom=[dwc:relatedResourceID]:"Severe Acute Respiratory Syndrome-CoV". This would return all records that include both relationshipOfResource:"host of" and relatedResourceID:"Severe Acute Respiratory Syndrome-CoV", which should be enough to match what we need.
    • This has the caveat that if in the future we decide to use these terms to add more values things might not be as flexible, because of the "poluted" terms namespace.
    • I also stumbled across tdwg/dwc#194 https://github.com/tdwg/dwc/issues/194, which seems to be discussing matters regarding the direction of the relationship, so it would be good to take this into account.
  • The other is to use dwc:associatedTaxa https://dwc.tdwg.org/terms/#dwc:associatedTaxa%60, but I think it's the worst in terms of findability and proper metadata representation:

{ 'metadata': { ... 'custom': { 'dwc:associatedTaxa': ['"host of":"Severe Acute Respiratory Syndrome-CoV"'], } } }

Let me know what you think about these.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/plazi/arcadia-project/issues/143#issuecomment-611129037, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKS4QQFHG6LAB6CFWEEA3QTRLTBMNANCNFSM4MDH2SOQ .

-- M.

myrmoteras commented 4 years ago

@slint here some more context to the introduction of custom metadata for host relationships

https://docs.google.com/document/d/1cKcQfx8X8uAXR6JF96jqZO8OCpwwkYatnxSIyVXvYbo/edit#

I also asked Jorrit and Nate to comment, since especially Jorrit is the global expert in this relationships and how to model and actually then use the data for his analyses.

A bit of context: within the CETAF Covid-Task force there is an interest to submit a RAPID proposal to the US-NSF to submit assembling and analyzing data about host virus specificity. They approached us as a data provider, with GloBI being the aggreagtor of the data and Nate Upham doing the analyses.

The submission data is May 5, a short proposal and rapidly reviewed. I am not sure whether this is realistic, but if we could add the three data fields and upload some publications so we can show this in the proposal, that would be great. It is a rather, in many was, special circumstance.

myrmoteras commented 4 years ago

@slint any news regarding the implementation of the biotic interaction custom metadata? tx for a brief note

myrmoteras commented 4 years ago

Dear Donat and Marcus, After some stormy weeks on Zenodo, we've managed to get the biotic interaction relationships metadata deployed on the Sandbox system. I took an example from the Google sheet you've been working on tracking the COVID-19 related host species relationships, and using the OBO vocabularies we discussed with Jorrit, created a minimal example for one of the papers here: https://sandbox.zenodo.org/record/621971. The interesting parts are in the custom metadata section:

image

Clicking on any of the links leads to a search query for getting results on records that contain the specific term on their side of the relationship. For example clicking on the "horseshoe bats" link would basically give results where the term "horseshoe bats" is on the left side of a "is host of" relationship. This basically means that we can capture metadata and construct complex queries which would allow answering questions like: • Find papers that mention any host species relationship with "SARS-CoV" (and/or variations of the name) As discussed in the past, we should have a call to discuss how to proceed on setting up the process for creating the records and then see what kind of information we can extract from the search API. Best, Alex