monarch-initiative / monarch-ui

The previous version of the Monarch Initiative website
https://previous.monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
17 stars 29 forks source link

Variant page takes too long (~7 min) to load #197

Closed monicacecilia closed 5 years ago

monicacecilia commented 5 years ago

What I was doing: On the 'search bar,' in the home page or elsewhere, upon entering the name gana-1 (a worm gene), the autocomplete offers an option to go to a page describing a generic (not species-specific) variant.

Screen Shot 2019-09-07 at 11 49 06 PM

The main problem:

Other oddities to look into:

===============

Screen Shot 2019-09-07 at 10 30 47 PM

kshefchek commented 5 years ago

I think this is an issue with the edge weighting. When we originally added edge weights, it was only after a user selected the gene radio button, since many genes have the same symbol (excluding casing). But over time something changed and we've included edge weights on all queries. Removing it seems to remove the SO class

https://api-dev.monarchinitiative.org/api/search/entity/autocomplete/gana-1?rows=10&start=0&highlight_class=hilite&boost_q=category%3Agenotype^-10&prefix=-OMIA&category=gene&category=variant&category=genotype&category=phenotype&category=disease&category=goterm&category=pathway&category=anatomy&category=substance&category=individual&category=publication&category=model&category=anatomical+entity

Some odd categories in here, goterm isn't a valid category, we should remove substance and individual too

Also, we've (maybe intentionally) removed the highlighting?

deepakunni3 commented 5 years ago

https://beta.monarchinitiative.org/variant/SO:0000694 - request hangs for nearly 7 minutes before displaying results.

This is primarily because the term itself is a generic ontology term and in Monarch's Knowledge Graph we have thousands of associations to this term.

This is the summary of the number of associations where SO:0000694 is either a subject or an object of an association,

{
  "facet_pivot": {
    "association_type": [
      {
        "count": 98197,
        "field": "association_type",
        "value": "variant_phenotype"
      },
      {
        "count": 61618,
        "field": "association_type",
        "value": "variant_gene"
      },
      {
        "count": 33048,
        "field": "association_type",
        "value": "variant_disease"
      },
      {
        "count": 110,
        "field": "association_type",
        "value": "variant_genotype"
      },
      {
        "count": 108,
        "field": "association_type",
        "value": "case_variant"
      },
      {
        "count": 222,
        "field": "association_type",
        "value": "model_variant"
      },
      {
        "count": 116491,
        "field": "association_type",
        "value": "publication_variant"
      }
    ]
  }
}

The 7 minute hang is because there are three BioLink API queries that the UI fires:

  1. To fetch the graph neighborhood for SO:0000694 (7 minutes)
  2. Fetch association counts (instant)
  3. Expand the CURIE to a URI (instant)

The first query is the longest because this query is actually going to SciGraph where it fetches all the nodes and edges at a depth of 1 (the default). Even at a depth of 1, there are 94812 nodes and 94814 edges returned by this query, which is considerably big.

Possible solutions:

monicacecilia commented 5 years ago

@deepakunni3 if this type of query is inevitable, then the option for pagination sounds reasonable.


@kshefchek

monicacecilia commented 5 years ago

also, this seems to be an edge case, and not a mission critical fix for v1.0, so I am assigning to the next milestone.

monicacecilia commented 5 years ago

Updates from @kshefchek already took care of this. Yay! You get a :bowtie: !!