Closed cmungall closed 3 years ago
For GO annotations that are loaded as sets this should be straightforward and get more straightforward as the GOC refines the new (relations) qualifiers. We can harvest the gene-GO-term relations from GPAD/GPI files in the GOC monthly releases. For cellular component and molecular function, the genes in a set will be part-of or enable the respective terms. For biological process, it makes the most sense at the start to use the broadest acts_upstream_of_or_within relations.
Phenotypes annotations might be straightforward as well. At least for sets that are generated from MODS like MGI it looks like the has_phenotype relation fits any gene that has been manipulated to result in a given phenotype.
Tagging some of the GW team: @echesler @treynr @jakeemerson @bergsalex @erichbaker
Following on this thread (I will continue to add), one of the types of sets we curate frequently are genes that are differentially regulated corresponding to a treatment or a disease. For example, a paper will describe a set of genes that are up-regulated in triple negative breast cancer compared to control tissue. The curator will create a set of those genes and associate the set with 'breast carcinoma' (HP:0003002) and 'triple-receptor negative breast cancer' (DOID:0060081). This assignment is good for finding a set that 'has something to do' with these ontology terms, but ideally we would like to express that the association more precisely as an up-regulation of these genes. There is a relation 'over-expressed in' (RO:0002245), but the difficulty lies in whether or not this relation holds for (DOID:0060081) or (HP:0003002). Since a disease as defined by DO is a disposition and occurs_in has a range of a material or immaterial entity, then I guess this would be a valid relationship for the DO class. For HP I'm not as certain that the 'expressed in' relations are appropriate. 'The presence of a carcinoma of the breast' does not seem like a valid range for 'over-expressed in' RO:0002245. Is there a relation that describes a gene that is overexpressed as part_of(??) a phenotype?
@judyblake
Looking a little closer at 'over-expressed in' (RO:0002245), the range is an independent continuant, so it won't work for a disease, since a disposition is a specifically dependent continuant. So what is the relationship between an over- under-expressed gene and a disease or a phenotype?
One of the challenges here is that a natural biological upper level ontology doesn't necessarily align with BFO. diseases and phenotypes are clearly very alike, to the extent that HP has seemingly identical classes to MONDO, DO or NCIT (your breast carcinoma example) (I have been discussing this with @mbrush ). We've been putting in stricter D+R constraints in #261 and we have been using the expression disease or phenotype
a lot which is IMHO a little unsatisfactory.
OK, let's look at your example, I would model this in granular terms for a disease as:
(disease) realized_in (disease process) causally upstream of, {+,-} effect (gene expression)
, where the gene expression node occurs-in some (tissue) and has-input some (gene).
(this may be an overstatement as perhaps for some gene sets the overexpression may be causative or merely correlated?)
So if we have a shortcut relation for the + and - forms that should serve your purposes?
I think the inverse relation would. I was thinking that a gene sets would have member genes and the descriptions would be similar to 'A gene in this set property-ontology term'. See my GO example above. I also think phenotype examples for MP and HPO are reasonably straightforward. Since the curation stream at MGI captures genes that contribute to phenotypes when mutated, I think we could use 'has phenotype' as a valid relation. I assume that HPO has a similar paradigm. For the tissue example above, I assume that a tumor (or metastasis) is included? What ontology represents those material entities?
Maybe if we have a bit of spare time in Montreal, we can look at some specific gene sets? We will want to go beyond just cancer. @echesler and her group curate a lot of addiction- and behavior-related sets. We'd also be interested in some default relations that we could use for initial population, a generic relation between a continuant (gene) and any ontology term that we have used in the past.
The causal versus correlative point is well-taken. One of our use cases is to hypothesize about causal from correlated using the GW tool suite. Many of our sets are correlated.
Trying to tag @judyblake again.
@sierra-moxon since this was tagged "biolink" (albeit by past-me, who may have just been guessing) do you happen to know if this can be closed?
Yes. I think so
Typos and randomly weird text courtesy of my iPhone
On Nov 3, 2021, at 8:08 PM, Nomi Harris @.***> wrote:
@sierra-moxon since this was tagged "biolink" (albeit by past-me, who may have just been guessing) do you happen to know if this can be closed?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
geneweaver currently annotates gene sets with ontology classes - e.g. a gene set for a GO process or a phenotype or a disease. One problem here is this is ambiguous as to the relationship between the gene and the term. What does it mean to be a disease gene set?
@ukemi is exploring adding relationships to geneweaver. What would help if there was a subset of RO created for anything that could be used as a X-to-gene relationship, in particular for diseases.
This relates to work @mbrush has been doing on the predicate set for the biolink model in the NCATS Translator project.