monarch-initiative / biolink-api

API for linked biological knowledge
https://api.monarchinitiative.org/api/
BSD 3-Clause "New" or "Revised" License
63 stars 25 forks source link

Add the ability to get association counts for generic routes #231

Closed deepakunni3 closed 5 years ago

deepakunni3 commented 5 years ago

Add ability to get association counts for:

The response looks like so:

{
  "taxon": {
    "id": "NCBITaxon:9606",
    "label": "Homo sapiens"
  },
  "association_counts": {
    "gene": 79,
    "interaction": 58,
    "homolog": 51,
    "phenotype": 84,
    "anatomical entity": 20,
    "biological process": 14,
    "pathway": 6,
    "disease": 4,
    "publication": 32,
    "variant": 13
  },
  "xrefs": null,
  "description": null,
  "categories": [
    "gene",
    "sequence feature"
  ],
  "types": null,
  "synonyms": null,
  "deprecated": null,
  "replaced_by": null,
  "consider": null,
  "id": "HGNC:18603",
  "label": "COL25A1"
}

Note: This PR is experimental and would like feedback from @cmungall @kshefchek @putmantime

deepakunni3 commented 5 years ago

Related to https://github.com/biolink/biolink-api/issues/168

kshefchek commented 5 years ago

Looks great! For genes we'll also want the count of phenotypes linked to all orthologs, and diseases associated with orthologs. If you're testing with our dev index these won't show up since a bad version of panther made it through.

deepakunni3 commented 5 years ago

@kshefchek I guess that will have to be a separate Solr query, right? Just wondering if it can be done in a more efficient way.

kshefchek commented 5 years ago

For stats via orthology, I think we could get the breakdown by type (gene, phenotype, disease, function) and for each type get the stats per taxon. Or alternatively we could get the breakdown of taxon per relation. The later would allow us to disambiguate gene-gene data (intereraction vs orthology) but is harder to process.

Example taxon per category: https://solr.monarchinitiative.org/solr/golr/select/?defType=edismax&qt=standard&indent=on&wt=json&rows=0&start=0&fl=*,score&facet=true&facet.mincount=1&json.nl=arrarr&facet.limit=20&facet.method=enum&fq=subject_ortholog_closure:%22MGI:98297%22&q=*:*&stats=true&stats.field={!tag=piv1%20calcdistinct=true%20distinctValues=false}object&facet.pivot={!stats=piv1}object_category,subject_taxon

Taxon per relation: https://solr.monarchinitiative.org/solr/golr/select/?defType=edismax&qt=standard&indent=on&wt=json&rows=0&start=0&fl=*,score&facet=true&facet.mincount=1&json.nl=arrarr&facet.limit=20&facet.method=enum&fq=subject_ortholog_closure:%22MGI:98297%22&q=*:*&stats=true&stats.field={!tag=piv1%20calcdistinct=true%20distinctValues=false}object&facet.pivot={!stats=piv1}relation_label,subject_taxon

The latter is more informative, but we would need to disambiguate things like has_phenotype (sometimes overloaded and used for gene-disease, disease-phenotype)

This call would include everything (type per taxa per relation) but takes longer to finish: https://solr.monarchinitiative.org/solr/golr/select/?defType=edismax&qt=standard&indent=on&wt=json&rows=0&start=0&fl=*,score&facet=true&facet.mincount=1&json.nl=arrarr&facet.limit=20&facet.method=enum&fq=subject_ortholog_closure:%22MGI:98297%22&q=*:*&stats=true&stats.field={!tag=piv1%20calcdistinct=true%20distinctValues=false}object&facet.pivot={!stats=piv1}relation_label,subject_taxon,object_category

I think for now let's go with a simple call (eg category per relation as you have done previously) and have a separate call for the more complex query.

deepakunni3 commented 5 years ago

Added counts for ortholog associations using suggestions made by @kshefchek

New response looks like so:

{
    "taxon": {
        "id": "NCBITaxon:9606",
        "label": "Homo sapiens"
    },
    "association_counts": {
        "interactions": 58,
        "homologs": 51,
        "phenotypes": 84,
        "anatomy": 20,
        "functions": 14,
        "pathways": 6,
        "diseases": 4,
        "publications": 32,
        "variants": 13,
        "ortholog-interactions": 104,
        "ortholog-anatomy": 24,
        "ortholog-functions": 17,
        "ortholog-phenotypes": 18,
        "ortholog-pathways": 6
    },
    "xrefs": null,
    "description": null,
    "categories": [
        "gene",
        "sequence feature"
    ],
    "types": null,
    "synonyms": null,
    "deprecated": null,
    "replaced_by": null,
    "consider": null,
    "id": "HGNC:18603",
    "label": "COL25A1"
}
deepakunni3 commented 5 years ago

@kshefchek Could you take a look at this? I think all the necessary counts are being returned. Wanted to see if I am interpreting the counts properly.

Note: To get this to work you would have to use ontobio@master

kshefchek commented 5 years ago

+1, thanks for adding this!

deepakunni3 commented 5 years ago

Awesome! 👍