statgen / locuszoom

A Javascript/d3 embeddable plugin for interactively visualizing statistical genetic data from customizable sources.
https://statgen.github.io/locuszoom/
MIT License
154 stars 29 forks source link

[request] Display pre-computed credible sets #146

Closed welchr closed 4 years ago

welchr commented 6 years ago

Dropping a ticket in here so we remember from today's call.

Ben would need some way to display credible sets within LocusZoom, when they've already computed the credible sets outside of LocusZoom.

This could be possibly:

  1. A credible set data source that can draw from an API with pre-computed credible sets. This could be subclassed to be specific to the Broad API. We would need an example of Ben's API/data for this, which he agreed to send. One small hiccup is that they often have multiple credible sets for a given set of association results, whereas we had only originally thought of there being 1 credible set shown per panel.

  2. A mechanism for the page to update the currently highlighted variants as a credible set, if the page needed to synchronize updates with LZ. This sounded much harder to implement.

Please re-word or nuke from orbit anything I wrote above.

abought commented 6 years ago

In order to write a source that can handle the data, it would help to have a better sense of how the data is formatted / retrieved. @benralexander, could you please send something like this via email for future followup?

This would be a tentative future goal, but let me know if visualization plans progress far enough to make it a defined need. (that would bump it up in the queue)

Sample questions that I would be looking to answer on the data end:

  1. Is the data (currently) loaded via an API endpoint, or embedded in the page as JSON?
  2. Does this data change after the initial page load?
  3. Are all the credible sets defined in one blob, or as individual entries that are fetched and parsed separately? ("alldata.json" vs "credset1.json, credset2.json...")

What would you want to do with the credible set members on the plot?

If the data changes, who is responsible for driving the new calculation?

abought commented 4 years ago

No activity on this one for some time, so closing this ticket. Feel free to reopen if new work requires.

For now we shall assume that mechanisms like subscribeToData are a solution for coordinating the table with the plot. The nature of the API endpoint is up to the user but LocusZoom would be able to fetch that data.

psytky03 commented 3 years ago

Hi, just curious how should I prepare a json file with multiple loci?

abought commented 3 years ago

Thanks for the question. Could you clarify why your question was posted to this particular issue? Are you using your own code, or making LZ plots with a predefined tool?

In general, your data doesn't need to be in JSON format at all- there are many ways to use more familiar genetics formats, like tabixed files. For example: Make a plot locally, without uploading (includes simple credible sets feature): https://statgen.github.io/localzoom/ Upload to a server for a better experience, including summaries like a manhattan plot: https://my.locuszoom.org/

If you are writing your own website, LocusZoom.js also provides a way to retrieve data from tabix files (if you write a function to parse each line into an object with the appropriate fields). See demo and extension docs

psytky03 commented 3 years ago

Thanks for the quick response. we actually install the locuszoom JS in a local server and now learning how to use it. We could successfully plot the credible set information for a single locus (we made the json file using rjsonlite), but the credible info can not be shown when we put multiple loci into that json file. I guess perhaps "Are all the credible sets defined in one blob, or as individual entries that are fetched and parsed separately? ("alldata.json" vs "credset1.json, credset2.json...")" might be related with our problem...

abought commented 3 years ago

Normally LZ calculates credible sets dynamically, and adds a few annotations / hints to make the visualization work. I think that the answer to your question would depend on the details of file format and plot code- perhaps your sample data is missing an expected field, or there is something odd about the multi-locus file?

(for example, sometimes the "join" logic to connect two datasets depends on the assumption that association variants are sorted. Depending on how you concatenate multiple loci, that may or may not be true)

We're definitely trying to make it so that people don't need to convert their text files into JSON format in order to use LZ! Formats like tabix are useful because they can be used to query only the data for a specific region, rather than loading a huge amount of data to the browser for regions you aren't looking at.

psytky03 commented 3 years ago

Thanks for all the explanation! it is strange that when we only include one locus everything works fine (which means the field shouldn't be the problem), and for the multiple loci file, the data is sorted by CHROM and POS. It is somehow unclear to me how should I prepare a multilocus json file as the one in the example contain only one locus (https://github.com/statgen/locuszoom/blob/develop/examples/data/assoc_10_114550452-115067678.json), it would be helpful if there is one example I can look into and copy the format?

abought commented 3 years ago

Thanks for the question.

It's hard to advise you on how to prepare a JSON file for more than one locus, because we really discourage doing that in practice: some GWAS files are > 1 GB now, and we try not to promote approaches that would crash the browser if someone tried to scale the example to a generic problem.

Tools like our tabix URL source fetch just the data for the region currently being viewed, then convert the data to a JSON-like format in the browser. (saves on redundant storage + much less data transfer) For big sites, there is usually a web service that coordinates this retrieval, but tabix files are useful if you don't want to set up the infrastructure.

Otherwise, without seeing an example file (or examples of your code or an example of what the plot looks like with the multilocus file), it's hard to debug what's going on..... feel free to reach out privately at abought /@/ umich/./ edu for anything you don't feel comfortable posting publicly.

Our "Guides" docs try to answer some basic questions about how layouts and data retrieval interoperate, if that helps?

We are working on adding a more rigorous mechanism to validate the expected data for a given rendering, but it involves a pretty big internal rework and there isn't a clear release timeline just yet.