monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
17 stars 4 forks source link

Anatomy associations are misleading #388

Open kevinschaper opened 11 months ago

kevinschaper commented 11 months ago

Anatomy associations are misleading. These include GO annotations. For example, S. cerevisiae has expressed in nucleotide-excision repair factor 3 complex Which seems strange? (it's part_of, not an expression location)

Corey Cox - This sounds like an ontology problem not a Monarch problem? Should we be trying to manage this somehow?

kevinschaper commented 11 months ago

We should probably use a label like Expression or ExpressionSite (or even Anatomy/CC?) to match that we have both anatomical ontology terms and GO cellular component terms.

amc-corey-cox commented 11 months ago

Moni is going to nicely state that this is what the data is.

amc-corey-cox commented 11 months ago

Can we create a link to the Ontology to file an issue?

monicacecilia commented 11 months ago

Responded and filed https://github.com/geneontology/helpdesk/issues/459

cmungall commented 11 months ago

I don't think there is a GO issue here. The GO GAF says part_of for the relationship between SGD genes and complexes

e.g

SGD S000000973 RAD3 part_of GO:0000112 PMID:8855246 IDA C 5' to 3' DNA helicase YER171W|REM1|TFIIH/NER complex ATP-dependent 5'-3' DNA helicase subunit RAD3 protein taxon:559292 20100601 SGD UniProtKB:P06839

kevinschaper commented 11 months ago

This is what it looks like in the Alliance SGD expression file:

        {
            "assay": "MMO:0000642",
            "dateAssigned": "2010-06-01T00:06:00-00:00",
            "evidence": {
                "crossReference": {
                    "id": "SGD:S000044877",
                    "pages": [
                        "reference"
                    ]
                },
                "publicationId": "PMID:8855246"
            },
            "geneId": "SGD:S000000973",
            "whenExpressed": {
                "stageName": "N/A"
            },
            "whereExpressed": {
                "cellularComponentTermId": "GO:0000112",
                "whereExpressedStatement": "nucleotide-excision repair factor 3 complex"
            }
        },

I'm not sure if the problem is that this format is lossy (like, we want to express this as a post-composition of whole organism and GO:0000112) or if this is information that we want from the GO annotation ingest and we just want to exclude it from our Alliance expression ingest. (Is getting gene expression from SGD is redundant with getting GO:CC Annotations?)

pgaudet commented 11 months ago

Thanks for the clarification. We have changed the top level in GO a few years ago, to clearly separate protein-containing complexes:

image

So, your script should not apply an expression annotation for children of protein-containing complex, but only for cellular anatomical entity (and virion component but I dont think this is relevant for Alliance).

Thanks, Pascale

vanaukenk commented 11 months ago

I think some follow up with the Alliance Expression Working Group would be good, so what they export in the Alliance Expression JSON and TSV is in alignment with how these various resources, e.g. Alliance, GO, Monarch, are thinking about expression data.

One proposal that the Expression Working Group is considering is to update the Alliance expression LinkML model to have more explicit gene product-to-term relations, as GO does in the GAF and GPAD files. This seems to me a better long-term solution so that anyone who ingests the Alliance expression files would also always have the correct relations available to display as needed.

Tagging @draciti

draciti commented 11 months ago

Hello, Just confirming that there is an ongoing discussion to add relationships in the Alliance expression LinkML model. Once there is an update, I will post it here.

draciti commented 9 months ago

Update: We will include GO relationships in the Alliance expression LinkML model for subcellular localization. (Discussed on Dec 18th, 2023, Alliance Expression WG).

monicacecilia commented 2 months ago

@kevinschaper 👀 👆 - FYI.