statgen / locuszoom-api

Flask server code for LocusZoom APIs
https://portaldev.sph.umich.edu/docs/api/v1/
1 stars 3 forks source link

LD data unavailable for some regions #16

Closed abought closed 6 years ago

abought commented 6 years ago

Summary

Some of the regions for the API do not return LD data. This means that (some) sample LocusZoom js plots are rendered in solid grey instead of in color. This includes the default plot that is the first demo on the landing page.

Description/ steps to reproduce

See this page: default plot, and "top hits" links on the left-hand sidebar of the page. http://statgen.github.io/locuszoom/

This issue causes the sample plots to appear grey instead of in color. (eg the base example and some of the top hits like TCFL72)

screen shot 2017-12-04 at 11 46 35 am

Other plot regions have LD data and show up in color just fine, eg FTO.

screen shot 2017-12-04 at 11 46 28 am

Scope/ impacts

For LZ demo purposes, a work around is available: we can deal with this by changing the default region on the example plot in our next release. Therefore a backend fix would be nice, but is not urgent.

Notes

Ryan suggests that the issue is with the cache mechanism for the LD endpoint.

welchr commented 6 years ago

Some extra notes...

For the broken region, directly querying the LD server works just fine:

http://portaldev.sph.umich.edu/api_ld/ld?population=ALL&chromosome=10&startbp=114458349&endbp=115058349&variant=10:114758349_C/T

But accessing it via the locuszoom API endpoint with the exact same parameters fails:

https://portaldev.sph.umich.edu/api/v1/pair/LD/results/?filter=reference eq 1 and chromosome2 eq '10' and position2 ge 114458349 and position2 le 115058349 and variant1 eq '10:114758349_C/T'&fields=chr,pos,rsquare

Should be just a matter of dropping into the debugger and checking out what's failing. I will try to look as soon as possible but higher priority items pending first.

welchr commented 6 years ago

I started to take a look at this and unfortunately accidentally caused the cache to store the value at this locus, so I'm without a working test example for now.

Next time I'll do the smart thing and have redis run a snapshot before I try this again.

abought commented 6 years ago

Try some of the other example page top hits, eg ZBED3- those still seem to show grey for LD. http://statgen.github.io/locuszoom/

("Always carry a spare bug, just in case")

welchr commented 6 years ago

OK awesome, I didn't know other regions had the same issue.

abought commented 6 years ago

OK awesome, I didn't know other regions had the same issue.

Had this been that exclusive, I'd start to wonder what I'd done to make the universe single me out for special treatment. ;)

Heading out to show my face at the department holiday party, but on later if you want help debugging things. (I'll have some quality laptop time in between stages of bread proofing)

welchr commented 6 years ago

Unfortunately ZBED3 looks like a separate issue. The reference variant is 5:76453765_C/A,T, and I don't believe the reference panel we're using for 1000G contains multi-allelic variants specified in that way (my guess is they would be split into two variants C/A and C/T). So LD could never be calculated for that variant.

I think this issue crept in after updating to dbSNP 150. We have to convert rsIDs into the chr:pos_ref/alt naming for the database/API when parsing some older datasets. Unfortunately, dbSNP 150 has a lot more variants with multiple ALT alleles, my guess is because of the addition of some large sequencing study (finding new rarer ALT alleles at same positions.) An rsID that used to simply resolve to 5:76453765_C/A now resolves to 5:76453765_C/A,T because of the discovery of the T allele during the update, and from the older data it may not be possible to figure out which allele they were actually referring to by the rsID alone.

If the older datasets contain an effect and non-effect allele, that could be enough to figure it out. If they don't, and all they specify is rsID and p-value, we may be stuck for now being unable to calculate LD, since we won't ever be able to match it against the reference panel (we don't know which alleles it refers to.)

All that aside, the majority of cases (>98%) where LD can't be computed is just due to a perfectly normal biallelic variant that isn't present in the reference panel. For this, the following would help:

  1. Return some type of error code denoting LD failed to compute because the variant was not in the reference panel
  2. locuszoom sees the error and hints to the user they may want to try a different reference panel
  3. Implement UI to show information on available reference panels (depends on #2), and choose which one to use

In summary, there are 3 separate issues to resolve:

  1. Database update causes rsID -> multi-allelic variants, which do not appear ever in a reference panel that I know of
  2. LD fails to compute for some biallelic variants that aren't in the reference panel (majority of cases, but not well handled currently as far as error reporting)
  3. A thorny issue of one perfectly normal variant, that does exist in the reference panel, but is not being returned from the cache correctly (the chr10 TCF7L2 variant in the original comment above.)

Fixing the 2nd point there is probably the biggest bank for the buck and clearest to resolve. If we can trigger the 3rd at some point, then I'll take a look further. The 1st is probably a longer term resolution.

abought commented 6 years ago

Thanks for the informative workup; overall the plan and triage appears to be good.

I like the idea of an error code to say that the specified API request does not make sense in principle. And that does seem like something we'd want to coordinate with LZjs to make sure the error is handled properly.

That said: would it make sense for our demo page not to make requests that can't be resolved in principle? Is there an updated version of the request URL that could be used to make this example meaningful?

welchr commented 6 years ago

Totally, we shouldn't keep the ZBED3 example on the demo page (though it would be good to capture the coordinates / reference SNP as a test case for future reference.)

There's unfortunately right now no way to modify the request to work, it's more of a problem with the underlying data/database that needs to be resolved.

abought commented 6 years ago

it would be good to capture the coordinates / reference SNP as a test case for future reference

Will remove this from LZjs demos as part of the credible sets WIP. (and will aim to post rough draft of that PR by end of day) We have enough other examples that a 1:1 replacement isn't necessary.

Since we use version control, we won't need to do anything special to track the coordinates, but I'll copy them on this ticket for any future related development work:

"5:76427311", "ZBED3"