jbloomlab / SARS-CoV-2-RBD_DMS

Deep mutational scanning of the receptor-binding domain of SARS-CoV-2 Spike
BSD 3-Clause "New" or "Revised" License
43 stars 17 forks source link

Interactive heat map v1 #54

Closed jbloom closed 4 years ago

jbloom commented 4 years ago

@tylernstarr, so I have been looking over the Figure 2 draft.

I continue to think it's really important that we have easily accessible data. However, I agree the heat maps aren't working very well as main figures. My thought is that the goal of these is really to provide a data look-up table for people who can't effectively parse the CSVs. The current Figure 2 heatmaps don't work well for this for several reasons, including being hard to read off the values and too small.

So I propose we instead make interactive heat maps as a supplementary figure that we can also host as HTML. These effectively are then graphical lookup tables for users.

These pull request implements a rough draft in altair. I have attached the generated heat map, and it would be great if you can look at that. (I had to ZIP it because for some reason GitHub doesn't allow HTML attachments). interactive_heatmap.html.zip

This interactive heatmap has a few nice features:

Note that it still lacks some desirable features, such as zooming the x (site) axis, labeling wildtype residues with marks, good legend labels, and perhaps the color scheme.

My basic question at this point is that do you think an improved interactive heat map in the supplement would be a good thing? If so, I think we can fix the above things pretty easily in altair. I would propose we ask @skhilton to do this since she is awesome with altair, although if she can't then I expect I could figure it out in 6-8 hours. If we want to use an improved version of these, we should probably first hash out specs.

(I also have many additional thoughts about Figure 2 if we are taking out heat maps, but will organize those and post later).

tylernstarr commented 4 years ago

Ok I think this sounds super, super cool. I will take a look at this a bit later. I still think a static heat map of the binding data in the main text might be a good idea -- I think Figure 2 just becomes a lot worse when we try to include two separate large heat maps. I'll think more as I look at these, but I think having the binding heat map as main text, and then interactive binding+expression heatmaps as supplement would be ok

jbloom commented 4 years ago

Great. After you look at the interactive heat map and firm up your impressions, if we decide to include and improve we can outline exact specs for @skhilton and see if she thinks feasible.

tylernstarr commented 4 years ago

Ok so looking at the figure, and I'm a big fan! Some of my questions just about logistics:

Suggestions and tweaks, mainly agreeing with the things you mention you didn't focus on too much yet, to organize future steps. It might be easiest to first finalize our static heatmaps, and then try to emulate these in the interactive heatmaps. Happy for suggestions from anybody about tweaks to make to these heatmaps!

tylernstarr commented 4 years ago

Oh -- another metric I was thinking about trying to layer on the static heatmap (though I think it would be going a bit too far/falling into overly-complex heatmap hell), is Neff from our sarbecovirus alignment. But, that could be interesting to add as an annotation in the callout box in the interactive heatmap?

I currently calcualte Neff in a couple sub-analyses, but I'll just push that calculation upstream so it's in the master data/RBD_sites.csv file as another annotation column

Looking through this RBD_sites.csv master annotation makes me wonder if we could efficiently represent any of these other annotations in the interactive callout? Mainly, epitopes -- it would be awful to have a separate annotation T/F for every single antibody. But if it were possible to have just a single "epitope: xxx" annotation in the callout box, where xxx lists every epitope for which the annotation is "True", that could be a nice feature to add. (So, for many sites it would be empty epitope: None, but for some others it might be a list e.g. "epitope: 80R, m396, B38"

skhilton commented 4 years ago

@tylernstarr, @jbloom: very excited about these. I can think of two ways going forward organizationally but open to other suggestions

  1. leave this PR open and we can continue commenting and pushing to it
  2. Close this PR, I'll start my own branch, we open an issue to keep the notes/conversation going

Open to either just let me know

tylernstarr commented 4 years ago

Ok, I think staying on this branch makes sense for now!

jbloom commented 4 years ago

As far as the branch, I wonder if it makes sense to merge this as v1 heat map, have @tylernstarr do the updated annotations he was talking about, and then make a new branch to do a v2 heat map? I don't feel strongly though---just know it will be easier if branch isn't diverging from master in many different files.

As far as rendering the heat map, it doesn't show up in static Jupyter notebook. But it can be saved as either HTML or JSON that can be rendered by any web browser. Whether that can directly be a supplemental figure depends on whether the journal allows HTML supplementary files. If not, we can link to webpage. It is actually somewhat hard to render HTML directly in GitHub, and we may have to do it via GitHub pages (or maybe there is another way). In any case, I think figuring out how to render the interactive HTML directly from GitHub can wait until the end to worry about it.

I think it would be possible to have a dropdown box for antibodies that would then highlight those sites when antibodies are selected, similar to the way that mouseover now highlights sites.

I agree with @tylernstarr about it being important to somehow have enough info in the tool tips text box, but not so much it becomes overwhelming.

My suggestion is that @tylernstarr puts together some issues in a GitHub project board about the things to address, and we try to prioritize them for @skhilton. Some of them (like somehow making the x site axis zoomable seem essential, as does fixing the color scales). Others, like adding antibodies seem nice but less important.