pantherdb / pango

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Add "Unknown [aspect]" annotation if gene is missing annotations to any GO aspect #4

Open dustine32 opened 1 year ago

dustine32 commented 1 year ago

In the annotations JSON input file, create an annotation for a gene and specific aspect if the gene has no other annotations to that aspect.

For example, if gene XYZ has annotations to GO:0000093 (a BP) and GO:0000095 (MF) but no CC annotation, a new annotation should be created like:

    {
        "gene": "UniProtKB:12345",
        "gene_symbol": "XYZ",
        "gene_name": "Fake gene not real",
        "term": "UNKNOWN:003",
        "slim_terms": [
            "OTHER:003"
        ],
        "qualifiers": "is_active_in",
        "evidence": []
    }

In the accompanying ontology JSON, these UNKNOWN terms should be defined:

    {
        "ID": "UNKNOWN:003",
        "LABEL": "CC unknown",
        "hasOBONamespace": "cellular_component",
        "is_goslim": false
    }

Some questions for @thomaspd @huaiyumi:

  1. Should slim_terms be empty in this case? If no, we will use the new OTHER terms from issue #3.
  2. Should qualifiers be blank or should I follow the GO default relations rules (MF=enables, BP=involved_in, CC=is_active_in)?
  3. Should evidence contain anything like a gene ID or reference? Or just be empty?
dustine32 commented 1 year ago

@thomaspd @huaiyumi This also means that every HUMAN gene in PANTHER 15.0 should have at least three "annotations" (they could include these "unknown [aspect]" annots), right? So there will be some genes that only have three "unknown" annotations if there are no IBAs to that gene.

dustine32 commented 1 year ago

From 2022-12-12 group meeting: The slim_terms field value for any UNKNOWN annotation should be the same UNKNOWN term. For example:

    {
        "gene": "UniProtKB:12345",
        "gene_symbol": "XYZ",
        "gene_name": "Fake gene not real",
        "term": "UNKNOWN:003",
        "slim_terms": [
            "UNKNOWN:003"
        ],
        "qualifiers": "is_active_in",
        "evidence": []
    }
dustine32 commented 11 months ago

Oh good, this issue is still open! Found a new bug:

Genes that only have NOT annotations for a certain aspect are not getting the appropriate UNKNOWN annotation for that aspect. Ex:

O14531      NOT|enables GO:0004157
O14531      NOT|involved_in   GO:0006208

Gene O14531 only has one annotation and it's to UNKNOWN CC, the only aspect that doesn't have a NOT. Those NOT's above should truly be ignored and O14531 should have three UNKNOWN annotations, one for each aspect.