For many infectious diseases of humans, the pathogen is transmitted via a vector such as a mosquito.
RO currently has a biotic interaction hierarchy for recording pairwise associations between organisms. This is currently sufficient for simple observational symbiotic relationships (e.g. human hosting zika, mosquito hosting zika) as well as trophic observations (mosquito feasting off human).
However, this is not sufficient to capture the ternary pathogen-vector-host relationship. While this can be done with a highly granular model that spans molecular, macroscopic and epidemiological levels (for example, representing the complete lifecycle) the requirement here is for a series of shortcut relations for databases such as GloBI that allow for queries such as:
what insects live in swamps and transmit diseases to humans?
what viruses are transmitted to mammals via arthropods?
what are the reservoirs of pathogen X?
pathogen-vector relationships
We currently have the is vector for relationship, which may need to be refined. This is not currently placed under symbiotic (recall symbiotic encompasses mutualism through parasitism), I suppose because the association in some cases is short-lived? In any case, this would be the relationship between, say, aedes egyptii and zika. It would be good to get some clarification on correct usage of this relationship from domain experts. Making this association, above and beyond a simple host-of relationship requires some level of inference. This may often be obvious, but not always.
Some suggested criteria (possibly too strong, expert opinion sought)
the agent is non-pathogenic in the vector
the agent has evolved to use the vector as an intermediate, and cannot complete its life cycle in the vector
there is an additional host in which the agent completes its life cycle, and the vector transmits the agent to the host (e.g. via bites)
If these criteria are not fulfilled, then a new role/relation may be required, such as that of reservoir. This is not yet in RO. Note that we don't have this in RO yet, if we need to add it we need to be careful to distinguish organismal reservoirs and non-organismal. See when is a reservoir not a reservoir?
Database guidelines for n-ary relationships
The fundamental unit of abstraction we are aiming at with these shortcut relationships is a ternary relationship between 3 species: pathogenic agent, vector and host (there may be cases where we have quaternary relationships, involving the pathogen crossing the species barrier such as when one mammal eats another). This points to a more granular model in which multiple process instances can be connected in a graph as in a LEGO model, but for databases that are primarily interested in shortcut relationships between species when should introduce guidelines here.
RO is not really intended for direct representation of ternary relations, since its representation language is OWL. So the proposed guideline here would be that we could make 3 statements:
AGENT pathogen-of HOST
VECTOR vector-of AGENT
AGENT transmits-pathogen-to HOST
And then the database can include a mechanism to optionally link these, e.g. to say that situation 1 arises via 2 followed by 3. Implementation details may vary.
Alternatively, one could simply modify statement 1 with an additional relata, to bring in the vector, but this is awkward for the reasons mentioned above (RO is in OWL), and furthermore prohibits the making of more granular statements (e.g the environment and locate for the initial pathogen-vector interaction may well differ from the environment and locate for the vector-host interaction).
Note in the example above we are introducing a relation not yet in RO, for statement 3. We could simply use 'parasite-of', but if we have the evidence we may want to make the stronger statement.
Required for: https://github.com/jhpoelen/eol-globi-data/issues/206
Background
For many infectious diseases of humans, the pathogen is transmitted via a vector such as a mosquito.
RO currently has a biotic interaction hierarchy for recording pairwise associations between organisms. This is currently sufficient for simple observational symbiotic relationships (e.g. human hosting zika, mosquito hosting zika) as well as trophic observations (mosquito feasting off human).
However, this is not sufficient to capture the ternary pathogen-vector-host relationship. While this can be done with a highly granular model that spans molecular, macroscopic and epidemiological levels (for example, representing the complete lifecycle) the requirement here is for a series of shortcut relations for databases such as GloBI that allow for queries such as:
X
?pathogen-vector relationships
We currently have the is vector for relationship, which may need to be refined. This is not currently placed under symbiotic (recall symbiotic encompasses mutualism through parasitism), I suppose because the association in some cases is short-lived? In any case, this would be the relationship between, say, aedes egyptii and zika. It would be good to get some clarification on correct usage of this relationship from domain experts. Making this association, above and beyond a simple host-of relationship requires some level of inference. This may often be obvious, but not always.
Some suggested criteria (possibly too strong, expert opinion sought)
If these criteria are not fulfilled, then a new role/relation may be required, such as that of reservoir. This is not yet in RO. Note that we don't have this in RO yet, if we need to add it we need to be careful to distinguish organismal reservoirs and non-organismal. See when is a reservoir not a reservoir?
Database guidelines for n-ary relationships
The fundamental unit of abstraction we are aiming at with these shortcut relationships is a ternary relationship between 3 species: pathogenic agent, vector and host (there may be cases where we have quaternary relationships, involving the pathogen crossing the species barrier such as when one mammal eats another). This points to a more granular model in which multiple process instances can be connected in a graph as in a LEGO model, but for databases that are primarily interested in shortcut relationships between species when should introduce guidelines here.
RO is not really intended for direct representation of ternary relations, since its representation language is OWL. So the proposed guideline here would be that we could make 3 statements:
AGENT pathogen-of HOST
VECTOR vector-of AGENT
AGENT transmits-pathogen-to HOST
And then the database can include a mechanism to optionally link these, e.g. to say that situation 1 arises via 2 followed by 3. Implementation details may vary.
Alternatively, one could simply modify statement 1 with an additional relata, to bring in the vector, but this is awkward for the reasons mentioned above (RO is in OWL), and furthermore prohibits the making of more granular statements (e.g the environment and locate for the initial pathogen-vector interaction may well differ from the environment and locate for the vector-host interaction).
Note in the example above we are introducing a relation not yet in RO, for statement 3. We could simply use 'parasite-of', but if we have the evidence we may want to make the stronger statement.
See: https://github.com/jhpoelen/eol-globi-data/issues/207