monarch-initiative / monarch-legacy

Monarch web application and API
BSD 3-Clause "New" or "Revised" License
42 stars 37 forks source link

add gene / genomic view with phenotype variations #655

Open nathandunn opened 9 years ago

nathandunn commented 9 years ago

FYI: @selewis

Each phenotype has associated variants from SciGraph, in addition to Gene location. We would like to load a "gene" via a &select tag and then dynamically insert the variants. @nlwashington Will add additional mockup.

These would be on a phenotype or disease page on the gene tab as a popup under each gene (but as a static image).

Later, it will allow us to do this dynamically for a specific gene and flip through associated diseases phenotypes.

selewis commented 9 years ago

Sorry, this does not make sense. Why remap the variants to gene coordinates? If both the variants and the gene are mapped to the genome then what possibly more needs to be done.

Query/retrieval should be

  1. get gene coordinates (and strand), add padding upstream and down.
  2. get features of type gene (which may include other genes) and variants (may specify particular set of variant calls) from that region then (if needed) reverse complement if primary gene is on reverse strand.

create your static image from that.

In particular "insert the variants" seems nonsensical. Insert them into what? They are already on the genome, no need to do more. Any overlap calculations are trivial, and code should already exist for that.

-S

On Thu, Feb 12, 2015 at 1:45 PM, Nathan Dunn notifications@github.com wrote:

FYI: @selewis https://github.com/selewis

Each phenotype has associated variants from SciGraph, in addition to Gene location. We would like to load a "gene" via a &select tag and then dynamically insert the variants. @nlwashington https://github.com/nlwashington Will add additional mockup.

These would be on a phenotype or disease page on the gene tab as a popup under each gene (but as a static image).

Later, it will allow us to do this dynamically for a specific gene and flip through associated diseases phenotypes.

— Reply to this email directly or view it on GitHub https://github.com/monarch-initiative/monarch-app/issues/655.

nathandunn commented 9 years ago

There is a separate variant file for each phenotype / disease. I'm not sure if they accumulate as you drill up the ontology as well. Anyway we are looking at upwards of 50k variant tracks. Instead it would make more sense to load these dynamically (thy can be queried out of scigraph.

Sent from my iPhone

On Feb 12, 2015, at 1:59 PM, selewis notifications@github.com wrote:

Sorry, this does not make sense. Why remap the variants to gene coordinates? If both the variants and the gene are mapped to the genome then what possibly more needs to be done.

Query/retrieval should be

  1. get gene coordinates (and strand), add padding upstream and down.
  2. get features of type gene (which may include other genes) and variants (may specify particular set of variant calls) from that region then (if needed) reverse complement if primary gene is on reverse strand.

create your static image from that.

In particular "insert the variants" seems nonsensical. Insert them into what? They are already on the genome, no need to do more. Any overlap calculations are trivial, and code should already exist for that.

-S

On Thu, Feb 12, 2015 at 1:45 PM, Nathan Dunn notifications@github.com wrote:

FYI: @selewis https://github.com/selewis

Each phenotype has associated variants from SciGraph, in addition to Gene location. We would like to load a "gene" via a &select tag and then dynamically insert the variants. @nlwashington https://github.com/nlwashington Will add additional mockup.

These would be on a phenotype or disease page on the gene tab as a popup under each gene (but as a static image).

Later, it will allow us to do this dynamically for a specific gene and flip through associated diseases phenotypes.

— Reply to this email directly or view it on GitHub https://github.com/monarch-initiative/monarch-app/issues/655.

— Reply to this email directly or view it on GitHub.

selewis commented 9 years ago

Yes, they would/could accumulate as you go up the phenotype/disease ontology to broader classes.

I would suggest using different colors for the subsuming phenotype/disease top-level classes (so that there will be <50 colors) and absolutely get all those variants in a single track. If the same variant appears > once then overlay/enlarge. Use the score for the vertical position and different shapes to indicate impact. (e.g. an asterisk for a stop codon, a circle for overlapping with a splice junction, etc.)

On Thu, Feb 12, 2015 at 2:29 PM, Nathan Dunn notifications@github.com wrote:

There is a separate variant file for each phenotype / disease. I'm not sure if they accumulate as you drill up the ontology as well. Anyway we are looking at upwards of 50k variant tracks. Instead it would make more sense to load these dynamically (thy can be queried out of scigraph.

Sent from my iPhone

On Feb 12, 2015, at 1:59 PM, selewis notifications@github.com wrote:

Sorry, this does not make sense. Why remap the variants to gene coordinates? If both the variants and the gene are mapped to the genome then what possibly more needs to be done.

Query/retrieval should be

  1. get gene coordinates (and strand), add padding upstream and down.
  2. get features of type gene (which may include other genes) and variants (may specify particular set of variant calls) from that region then (if needed) reverse complement if primary gene is on reverse strand.

create your static image from that.

In particular "insert the variants" seems nonsensical. Insert them into what? They are already on the genome, no need to do more. Any overlap calculations are trivial, and code should already exist for that.

-S

On Thu, Feb 12, 2015 at 1:45 PM, Nathan Dunn notifications@github.com wrote:

FYI: @selewis https://github.com/selewis

Each phenotype has associated variants from SciGraph, in addition to Gene location. We would like to load a "gene" via a &select tag and then dynamically insert the variants. @nlwashington https://github.com/nlwashington Will add additional mockup.

These would be on a phenotype or disease page on the gene tab as a popup under each gene (but as a static image).

Later, it will allow us to do this dynamically for a specific gene and flip through associated diseases phenotypes.

— Reply to this email directly or view it on GitHub https://github.com/monarch-initiative/monarch-app/issues/655.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/monarch-initiative/monarch-app/issues/655#issuecomment-74167302 .

nathandunn commented 9 years ago

I think the trick is to create a massive javascript array of variants based on the relevant phenotype and disease data and then use callbacks to decorate as you described above. That might end up being easier than the Trellis-type approach. I'll have to see if that's feasible.

nathandunn commented 9 years ago

The API looks like it might be able to drag in an external track: http://gmod.org/wiki/JBrowse_Configuration_Guide#Writing_JBrowse-compatible_Web_Services

selewis commented 9 years ago

I was thinking perhaps we could create and Apollo-slim of HPO and map all of the phenotypes up to those. Even better would be to make it dynamic, so that the more zoomed in you were the more resolution to the HPO classes.

-S

On Sun, Feb 22, 2015 at 9:21 PM, Nathan Dunn notifications@github.com wrote:

The API looks like it might be able to drag in an external track: http://gmod.org/wiki/JBrowse_Configuration_Guide#Writing_JBrowse-compatible_Web_Services

— Reply to this email directly or view it on GitHub https://github.com/monarch-initiative/monarch-app/issues/655#issuecomment-75490971 .

nathandunn commented 9 years ago

The idea is to make dynamic queries using SciGraph based on the appropriate identifier, which should give you the results you alluded to above: https://github.com/SciCrunch/SciGraph/issues/66

mellybelly commented 9 years ago

@nathandunn @selewis will this be ready for our May release?

nathandunn commented 9 years ago

http://duckworth.crbs.ucsd.edu:9000/scigraph/graph/neighbors/_:genid950466.json?depth=1&blankNodes=true&direction=both

http://geoffrey.crbs.ucsd.edu:8080/solr/feature-location/select/?q=*%3A*&wt=json

nathandunn commented 9 years ago

so use this type of query:

http://rosie.crbs.ucsd.edu:9000/scigraph/dynamic/phenotypes_with_gene.json?gene_id={gene_id} http://rosie.crbs.ucsd.edu:9000/scigraph/dynamic/genes_with_phenotype.json?phenotype_id={phenotype_id} http://rosie.crbs.ucsd.edu:9000/scigraph/dynamic/genotypes_from_gene.json?gene_id={gene_id}

where the gene_id would come from the URL on monarch I think?:

@ccondit @nlwashington @kshefchek

http://monarchinitiative.org/gene/NCBIGene:60641

Here is a sample of what we (@EstiiCoder) has with purely fake / external data:

http://tartini.crbs.ucsd.edu/labs/chromosome-vis-demo

We want to replace the backend DAS code with real code, and introduce variant data.

A couple of questions: 1 - where is the GOLR manual? I can't seem to find it. 2 - what servers should we would be using for SciGraph / GOLR queries? 3 - what API should we be using for grabbing the variant / Allele data? Looks like GOLR will workl

nathandunn commented 9 years ago

@EstiiCoder here is a good example page: http://tartini.crbs.ucsd.edu/gene/NCBIGene:6736

I think stuff like this should work:

http://geoffrey.crbs.ucsd.edu:8080/solr/golr/select?defType=edismax&qt=standard&indent=on&wt=json&rows=10&start=0&fl=*,score&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&fq=object_closure:%22NCBIGene:6736%22&fq=subject_category:%22variant%22&facet.field=subject_category&facet.field=subject_taxon&facet.field=subject_closure_label&facet.field=relation_closure_label&facet.field=qualifier&facet.field=evidence_closure_label&facet.field=object_category&facet.field=object_closure_label&q=*:*&packet=1&callback_type=search&json.wrf=jQuery111005003433541860431_1433966781607&_=1433966781608

cmungall commented 9 years ago

nice!

I note we don't have the yaml for the schema underlying this golr query - can you make sure it gets added?

nathandunn commented 9 years ago

I think that @kshefchek or @nlwashington had started on it.

We have enough info to pull off variants in the interim (I think). Its mostly there, but its not pulling back variant info. Do we need to write another query for that, or just add facets. . . not sure who to connect with to make this happen.

ccondit commented 9 years ago

@nathandunn - those REST resources have changed. See: http://rosie.crbs.ucsd.edu:9000/scigraph/docs/#!/dynamic for the new URLs.

cmungall commented 9 years ago

I confess I'm not fully grokking this ticket and what yr requirements are. Maybe just restate what we're trying to achieve here in high level terms?

nathandunn commented 9 years ago

@cmungall No worries . . I think with @ccondit 's link and the GOLR manual we are good. I think I just need to connect with Nicole a bit later today.

nathandunn commented 9 years ago

@cmungall We are grabbing variants for a phenotype, disease and displaying it here:

http://tartini.crbs.ucsd.edu/labs/chromosome-vis-demo

We just need live variant data (what we have right now from GOLR seems to lack location, but we haven't delved into the API).

nathandunn commented 9 years ago

@ccondit Thanks. This looks great. To get variants for a set of features, for example this one:

http://tartini.crbs.ucsd.edu/phenotype/HP:0000152

The dynamic API indicates doing it this way:

http://rosie.crbs.ucsd.edu:9000/scigraph/dynamic/phenotypes/HP:0000152/features

Is there an API to expand this to grab the variant data for this (right now its going through GOLR). Would be nice to understand it both way.

cmungall commented 9 years ago

So the idea is to have on or connected to this page all human chromosomes with all variants or genes associated with that phenotype indicated?

In that case it seems like the /feature-location/ schema needs to be extended to include a closure field for the ontology class used to annotate the feature.

nathandunn commented 9 years ago

Yeah, I think that @nlwashington is working on that (probably already done ;) ). I'll just need to double-check with here to see where we're at.

cmungall commented 9 years ago

Nicole is traveling but I think @ccondit can provide you with what you need.

Note for a high level phenotype like http://tartini.crbs.ucsd.edu/phenotype/HP:0000152 we are potentially talking about 100ks of features. For an ideogram, I assume you only need a tuple (chrom,band,phenotype,count). If the goal is to sum to a high level phenotype category then this should probably be done server-side too.

nathandunn commented 9 years ago

@cmungall I think so as well. We are trying to accomplish two things: 1 - bring back band data (which you have up there) 2 - display variants (w/ location) for specific phenotypes, etc. (the dots on the graph)

Do you know if we have a GOLR manual / API as well?

cmungall commented 9 years ago

There is a js API that wraps the solr calls, we're already using it in the beta monarch app code, @kltm and @kshefchek can help here

nathandunn commented 9 years ago

@cmungall Thanks!