statgen / locuszoom

A Javascript/d3 embeddable plugin for interactively visualizing statistical genetic data from customizable sources.
https://statgen.github.io/locuszoom/
MIT License
156 stars 29 forks source link

Annotate GWAS Catalog SNPs in LZ plot #124

Closed sgagliano closed 6 years ago

sgagliano commented 6 years ago

Have an option to highlight variants that appear in the GWAS Catalog in the LZ plot, and if possible also have a way (maybe a toggle box) to see with which trait(s) that variant is associated with in the GWAS Catalog. Purpose of this feature: it would provide a quick visual way to identify whether the locus is known to be associated with a trait of interest, even if the reference variant itself is not known to be associated.

abought commented 6 years ago

I could see a use case for this. The API docs suggest that we do have rsID information in our database, though I can't seem to find it in the response payloads for any given SNP...

That said- our API server isn't the only way to get data into LZjs- you can use your own custom datasource if you have your own project using LZjs.

I think the basic tooltip link and maybe some visual convenience would be feasible to add by using existing features- let me know how I could help?

welchr commented 6 years ago

We would need to implement the GWAS catalog API and database tables, which is a small amount of effort since the data is so small.

If I recall correctly, the reason we needed to mirror this was because the EBI GWAS catalog API did not support region queries, which we need for LZ.

Sarah notes they would like to have this for their paper, but sounds to be up in the air on when they would like to submit. This of course has to take a back seat to the portal priorities.

abecasis commented 6 years ago

Hey Ryan,

I really would love to see these annotations enabled and I would be happy to bump them above some of the portal priorities. :)

I am not sure if we should implement an API direct to the GWAS catalog, since their version of the data often has a lot of junk mixed in. It may be easier for us to create our own hits table and that would let us (for example) annotate UK Biobank peaks or some other interesting results of our choosing.

Goncalo

On Mon, Feb 26, 2018 at 2:10 PM, Ryan Welch notifications@github.com wrote:

We would need to implement the GWAS catalog API and database tables, which is a small amount of effort since the data is so small.

If I recall correctly, the reason we needed to mirror this was because the EBI GWAS catalog API did not support region queries, which we need for LZ.

Sarah notes they would like to have this for their paper, but sounds to be up in the air on when they would like to submit. This of course has to take a back seat to the portal priorities.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/locuszoom/issues/124#issuecomment-368613015, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUNY8tDvCGdaVogj60S0UvSG-VAGqks5tYwDwgaJpZM4STa3a .

abought commented 6 years ago

Spoke to Sarah a bit more, and it seems there are two approaches.

The "easy" version of this feature would be for our API to return rsID alongside each variant's other data (chrom/pos/ref/alt), or "null" if the variant had no rsID. This would answer the question "are ny of these SNPs known in the catalog at all" and would be enough for us mark "catalog variants" in the plot & put links in tooltips.

The more labor-intensive version would be to link the catalog to specific traits ("show me SNPS in this region that have been associated with Alzheimer's").

It's possible this could be done in two stages, depending on just how much a complexity jump is involved for the second version.

abecasis commented 6 years ago

Can we wireframe this? We might be thinking about different things. This is what I imagine:

[image: Inline image 1] In my mind, we would like to have a table of hits that can be queried by (e.g.) position and returns chr, position, label, URI

Then, each matching SNP in the region gets an arrow and the corresponding label. The labels could work just like the PheWas labels.

Goncalo

On Mon, Feb 26, 2018 at 2:45 PM, Andy Boughton notifications@github.com wrote:

Spoke to Sarah a bit more, and it seems there are two approaches.

The "easy" version of this feature would be for our API to return rsID alongside each variant's other data (chrom/pos/ref/alt), or "null" if the variant had no rsID. This would answer the question "are these SNPs known in the catalog at all" and would be enough for us mark "catalog variants" in the plot & put links in tooltips.

The more labor-intensive version would be to link the catalog to specific traits ("show me SNPS in this region that have been associated with Alzheimer's").

It's possible this could be done in two stages, depending on just how much a complexity jump is involved for the second version.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/locuszoom/issues/124#issuecomment-368625744, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUPW6H4ROXV2OSy89-JYYugjyoaSCks5tYwnUgaJpZM4STa3a .

welchr commented 6 years ago

rsID does not determine whether the variant is in the EBI GWAS catalog, only whether the variant is known by dbSNP. On a sidenote: we do have rsIDs available via /annotation/snps/ and /annotation/snps/results/ from dbSNP.

Completely agree that we do not want to have a passthrough API to their own, as their catalog data requires cleaning and tidying before use. It actually isn't even an option, since they don't support region queries.

Overall, it sounds like this feature just follows the typical cycle:

sgagliano commented 6 years ago

The GWAS Catalog is on build 38, so need to keep that in mind when matching by chr:pos. Since Gonçalo mentioned returning the URI, I wanted to point out that EFO mapped trait annotations are in v1.0.1 (but not v1.0).

Ryan, Andy- let me know how I can help with this.

welchr commented 6 years ago

That is a good point. Luckily we have data for most endpoints in both GRCh37 and 38, so we could start with 38 and map back to 37 later.

We should match on chr:pos_ref/alt I'm guessing, but I can't recall off the top of my head if EBI GWAS catalog provides REF/ALT alleles, or just effect/non-effect alleles... (updated comment above)

sgagliano commented 6 years ago

I believe the Catalog provides the risk/effect allele. So matching by ref & alt alleles won't be straightforward.

welchr commented 6 years ago

@sgagliano I think Goncalo wanted UKBB GWAS hits as well. Do you have those?

welchr commented 6 years ago

@abecasis (or anyone) - How likely is it someone would want to see GWAS hits from previous catalogs? Or would they just want always the latest catalog? Need to know whether to support the same catalog over multiple revisions or just store the latest.

sgagliano commented 6 years ago

@abecasis Regarding UKBB hits, should we use HRC-imputed UKBB GWAS results? Presumably, not yet TOPMed-imputed?

abecasis commented 6 years ago

oMy comments on this thread:

Goncalo

On Thu, Mar 15, 2018 at 1:05 PM, sgagliano notifications@github.com wrote:

@abecasis https://github.com/abecasis Regarding UKBB hits, should we use HRC-imputed UKBB GWAS results? Presumably, not yet TOPMed-imputed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/statgen/locuszoom/issues/124#issuecomment-373450989, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUN5cWiBJZuTohGomjy7PPNo4weZUks5tep9KgaJpZM4STa3a .

welchr commented 6 years ago

This is mostly done for the EBI catalog now. Data is in the database and there is an API endpoint for it, documented below (but you will need to replace api with api_internal_dev for now, as it's not deployed to production yet.)

http://portaldev.sph.umich.edu/docs/api/v1/#gwas-catalogs

Take a look and see if this fits your needs and let me know if any modifications are needed. If it looks fine, I'll deploy it. Please do not use the dev endpoint in production.

The UKBB hits are in the process of parsing - the data needed some extra steps that were already done with the EBI catalog.

sgagliano commented 6 years ago

Thank you Ryan! @pjvandehaar - could you please have a look to see if this can be integrated into the LZ in PheWeb or if modifications are needed.

abought commented 6 years ago

As mentioned to Sarah, I'm available for any questions on new visualization options in LZ.js as well. (depending on how you want to display this info) It never hurts to get features used in the wild and find ways to improve. :)

welchr commented 6 years ago

Forgot to mention last week - UKBB hits are available now too, for both GRCh37 and 38.

sgagliano commented 6 years ago

Thank you Ryan! Just to clarify, the UKBB "hits" are defined as variants that reached genome-wide significance (p<5E-8) in the HRC-imputed analysis for any of the 1400 phecodes?

welchr commented 6 years ago

Exactly. If there's a more stringent threshold to use given 1400 traits were tested, let me know and I can filter down further.

abought commented 6 years ago

Following up: @welchr , is this ticket still active based on the API work? If not, is it safe to close?

Is any additional support needed on my end to make this ticket a reality?

welchr commented 6 years ago

It's merged into master on the API/DB side. I think @pjvandehaar needs to try incorporating it into PheWeb, and then file additional issues if visualization methods other than tooltips are necessary for displaying the information.

abought commented 6 years ago

Thanks! Because we're awaiting user feedback to verify, I'll keep this ticket open for now. (just going through to weed open issues on the tracker. Feel free to close once the DB work is accepted)

abought commented 6 years ago

Attaching a screenshot from initial experiments, to use as a prop during a scheduled discussion with Sarah today.

This demonstrates two options:

  1. A rug track (mark which snps are in the catalog at all)
  2. A label track (*it's not unusual for a hit to appear in the catalog multiple times, eg associated with several traits or trait groups)

Other display options, like tables, are also a possibility.

screen shot 2018-09-14 at 12 28 55 pm

abought commented 6 years ago

The associated PR was merged and is awaiting pheweb integration. Closing this ticket accordingly.