pantherdb / pango

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Load all human experimental annotations and show PANTHER family if available #26

Closed dustine32 closed 1 year ago

dustine32 commented 1 year ago

We're going to start loading all experimental human annotations regardless of whether there is a matching IBA. These will appear similar to the existing IBAs having direct evidence (i.e., gene == with_gene_id). To differentiate between these purely experimental annotations and the other IBAs, we will display the IBAs' PANTHER families along with a new evidence_type value of "N/A".

The new evidence_type field will look like this in the annotation JSON:

{
        "gene": "UniProtKB:P17535",
        "gene_symbol": "JUND",
        "gene_name": "Transcription factor jun-D",
        "term": "GO:0008134",
        "slim_terms": [
            "OTHER:0001"
        ],
        "qualifier": null,
        "evidence_type": "homology",
        "evidence": [
            {
                "with_gene_id": "UniProtKB:P05412",
                "references": [
                    "PMID:10508860"
                ],
                "group": "UniProtKB"
            }
        ],
        "group": "GO_Central"
}

For these newly loaded experimental-only annotations, evidence_type will be "N/A" "direct":

{
        "gene": "UniProtKB:A0A0C5B5G6",
        "gene_symbol": "MT-RNR1",
        "gene_name": "Mitochondrial-derived peptide MOTS-c",
        "term": "GO:0003677",
        "slim_terms": [
            "GO:0003677"
        ],
        "qualifier": null,
        "evidence_type": "N/A",
        "evidence": [
            {
                "with_gene_id": "UniProtKB:A0A0C5B5G6",
                "references": [
                    "PMID:29983246"
                ],
                "group": "UniProtKB"
            }
        ],
        "group": "UniProtKB"
}

The new panther_family field will live in the gene info JSON because it is specific to each gene. If a gene is not in a family it will be set to null:

    {
        "gene": "UniProtKB:P17535",
        "gene_symbol": "JUND",
        "gene_name": "Transcription factor jun-D",
        "gene_long_id": ""
        "taxon_id": "9606",
        "coordinates_chr_num": "19",
        "coordinates_start": "18279760",
        "coordinates_end": "18280929",
        "coordinates_strand": "-1",
        "panther_family": "PTHR11462"
    },
    {
        "gene": "UniProtKB:P05412",
        "gene_symbol": "JUN",
        "gene_name": "Transcription factor AP-1",
        "gene_long_id": ""
        "taxon_id": "9606",
        "coordinates_chr_num": "1",
        "coordinates_start": "58780788",
        "coordinates_end": "58784327",
        "coordinates_strand": "-1",
        "panther_family": null
    },

To get the name of each PANTHER family, there will be a new panther_family lookup JSON file created. Each entry will look like:

    {
        "panther_family": "PTHR11462",
        "family_name": "JUN TRANSCRIPTION FACTOR-RELATED",
    },

Displaying in UI

Show PANTHER family name and identifier under the coordinates. If no family, display "No PANTHER family". If there is a family, this should be a linkout to the PANTHER tree viewer site, constructed using the following pattern:

http://www.pantherdb.org/treeViewer/treeViewer.jsp?book={pthr_family}&seq={gene_long_id}

EDIT: Added data structure to represent the gene_info lookup file, which is where the family for each gene should be declared. Moved family out of the annotation DS and added the evidence_type field.

EDIT: Changed required evidence_type value for loaded experimental-only annotations from "N/A" to "direct". These are "direct" annotations.

dustine32 commented 1 year ago

Link family URL to tree with sequence selected.

mugitty commented 1 year ago

For opening link to tree viewer, ensure the long id is encoded. For example, pantherdb.org/treeViewer/treeViewer.jsp?book=PTHR11462&seq=ORYLA%257CEnsembl%253DENSORLG00000018793%257CUniProtKB%253DH2MX33

dustine32 commented 1 year ago

image

Still not displaying annotation to GO:0005125. Could be problem with data.

dustine32 commented 1 year ago

For above comment https://github.com/pantherdb/pango/issues/26#issuecomment-1439124995, indeed this was a data issue but it should be fixed in the code now with https://github.com/pantherdb/pango/commit/d20333a145e3863a3fd8a59abc3c5fb497e55cd8.