scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.92k stars 598 forks source link

Interactive scatter plots #253

Open falexwolf opened 6 years ago

falexwolf commented 6 years ago

Hi @ivirshup!

We've discussed this in Aptos a couple of months ago. Adding an interactive parameter to all the scatter plots would be really useful for working with notebooks. Would you consider adding that functionality as you have a lot of experience with it? Importantly, it should be based on the restructured plotting code that @fidelram is currently working on in https://github.com/theislab/scanpy/pull/244 (we could move that branch to the scanpy repo?). Hence, this would be for post-Scanpy 1.3 and there is no great hurry.

A solution that takes an AnnData and creates an interactive plot but totally ignores the current way scatter plots are generated and Fidel's restructured way would be what follows below (due to @NDKoehler). Hence, the task is to think about a good way of integrating this with how scatter plots are done in Scanpy (after Fidel's changes).

from bokeh.plotting import figure, show, output_notebook, save#, output_file
from bokeh.models import HoverTool, value, LabelSet, Legend, ColumnDataSource
from bokeh.palettes import viridis
output_notebook()

import matplotlib as mpl

def plot_interactive(data):

    colors = [
        "#%02x%02x%02x" % (int(r), int(g), int(b)) for r, g, b, _ in 255*mpl.cm.viridis(mpl.colors.Normalize()(data.obs['CCS'].values))
    ]

    source = ColumnDataSource(dict(
        x=data.obsm['X_umap'][:,0],
        y=data.obsm['X_umap'][:,1],
        color=colors,#data.obs['CCS'],
        label=data.obs['Charge'],
        #msize= p_df['marker_size'],
        #topic_key= p_df['clusters'],
        #title= p_df[u'Title'],
        #content = p_df['Text_Rep']
        seq=data.obs['seq'],
        ccs=data.obs['CCS'],
        charge=data.obs['Charge'],
    ))
    #ax = sc.pl.umap(data, color=['Charge','CCS'])
    #sc.pl.umap(data, color=['CCS'], save='ccs')

    title = 'T-SNE visualization of sequences'

    plot_lda = figure(plot_width=800, plot_height=600,
                         title=title, tools="pan,wheel_zoom,box_zoom,reset,hover,previewsave",
                         x_axis_type=None, y_axis_type=None, min_border=1)

    plot_lda.scatter(x='x', y='y', legend='label', source=source, color='color',
                     alpha=0.8, size=5)#'msize', )

    # hover tools
    hover = plot_lda.select(dict(type=HoverTool))
    hover.tooltips = {"content": "Sequence: @seq, CCS: @ccs, Charge: @charge "}
    plot_lda.legend.location = "top_left"

    show(plot_lda)
gokceneraslan commented 6 years ago

Sorry if not super relevant but how does this look like compared to matplotlib + ipywidgets?

ivirshup commented 6 years ago

@falexwolf, I think it would be worth going over what kind of interactivity would be most useful.

I find linked selection and summary statistics on selected groups is pretty powerful. For QC plots, it's nice to know other properties of cells which look like outliers. It can also be useful for figuring out what's up with the classification that's not agreeing with your reduced dimension plot.

Being able to interactively search and select genes to view would also be nice. It would also be good if it were easy to share this kind of visualization with non-technical collaborators easily.

I think there's also a question of scale, and whether it would be nice to use libraries like datashader to avoid the over plotting problems that are so common in this field.

I'm working on a few prototypes at the moment, but I'm not sure how well they fit into the api of adding an interact flag. Once I have things a little more formalized I'll set up a repo, but am open to suggestions for other plot types. I'd be interested in getting opinions on the usefulness of datashader's edge bundling for graph plots (examples here under the header "Bundling graphs").

There's also the issue of libraries. Currently, I'm frustrated with every python plotting library, but am leaning towards the holoviews, bokeh, datashader stack for this.

@gokceneraslan If you're asking about what usage of bokeh looks like, they have a bunch of notebooks in a repo that'll run on Binder.

falexwolf commented 6 years ago

@ivirshup Thank you for your super elaborate response and treatment of the topic. I completely understand that you're going for a more comprehensive solution than something like the simple bokeh wrapper that I pasted above. I'd really be interested in something that combines datashader and bokeh, for instance. If you're creating your own package for that, it would be awesome if it was somehow possible to use it also for Scanpy.

Marius1311 commented 5 years ago

We have also been thinking about this a bit and we came up with a couple of functions in this repo here: https://github.com/theislab/interactive_plotting