Open kevinschaper opened 11 months ago
We should probably use a label like Expression or ExpressionSite (or even Anatomy/CC?) to match that we have both anatomical ontology terms and GO cellular component terms.
Moni is going to nicely state that this is what the data is.
Can we create a link to the Ontology to file an issue?
Responded and filed https://github.com/geneontology/helpdesk/issues/459
I don't think there is a GO issue here. The GO GAF says part_of
for the relationship between SGD genes and complexes
e.g
SGD S000000973 RAD3 part_of GO:0000112 PMID:8855246 IDA C 5' to 3' DNA helicase YER171W|REM1|TFIIH/NER complex ATP-dependent 5'-3' DNA helicase subunit RAD3 protein taxon:559292 20100601 SGD UniProtKB:P06839
This is what it looks like in the Alliance SGD expression file:
{
"assay": "MMO:0000642",
"dateAssigned": "2010-06-01T00:06:00-00:00",
"evidence": {
"crossReference": {
"id": "SGD:S000044877",
"pages": [
"reference"
]
},
"publicationId": "PMID:8855246"
},
"geneId": "SGD:S000000973",
"whenExpressed": {
"stageName": "N/A"
},
"whereExpressed": {
"cellularComponentTermId": "GO:0000112",
"whereExpressedStatement": "nucleotide-excision repair factor 3 complex"
}
},
I'm not sure if the problem is that this format is lossy (like, we want to express this as a post-composition of whole organism and GO:0000112) or if this is information that we want from the GO annotation ingest and we just want to exclude it from our Alliance expression ingest. (Is getting gene expression from SGD is redundant with getting GO:CC Annotations?)
Thanks for the clarification. We have changed the top level in GO a few years ago, to clearly separate protein-containing complexes:
So, your script should not apply an expression annotation for children of protein-containing complex, but only for cellular anatomical entity (and virion component but I dont think this is relevant for Alliance).
Thanks, Pascale
I think some follow up with the Alliance Expression Working Group would be good, so what they export in the Alliance Expression JSON and TSV is in alignment with how these various resources, e.g. Alliance, GO, Monarch, are thinking about expression data.
One proposal that the Expression Working Group is considering is to update the Alliance expression LinkML model to have more explicit gene product-to-term relations, as GO does in the GAF and GPAD files. This seems to me a better long-term solution so that anyone who ingests the Alliance expression files would also always have the correct relations available to display as needed.
Tagging @draciti
Hello, Just confirming that there is an ongoing discussion to add relationships in the Alliance expression LinkML model. Once there is an update, I will post it here.
Update: We will include GO relationships in the Alliance expression LinkML model for subcellular localization. (Discussed on Dec 18th, 2023, Alliance Expression WG).
@kevinschaper 👀 👆 - FYI.
Anatomy associations are misleading. These include GO annotations. For example, S. cerevisiae has expressed in nucleotide-excision repair factor 3 complex Which seems strange? (it's part_of, not an expression location)
Corey Cox - This sounds like an ontology problem not a Monarch problem? Should we be trying to manage this somehow?