hackseq / hackseq_projects_2016

8 stars 2 forks source link

Project 7: Visualization of single cell RNA-seq data from tens of thousands of cells in real time #4

Open ttimbers opened 8 years ago

ttimbers commented 8 years ago

Project: Characterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. Recent developments in single cell RNA sequencing (scRNA-seq) technologies have enabled profiling of tens of thousands of cells in a single sample. However, existing analysis tools are designed for datasets with much lower cell numbers, and to be used by people with a programming background, limiting their adoption. There is a strong need for an intuitive user interface to facilitate real-time data visualization and interpretation by biologists to accelerate the discovery cycle of scRNA-seq analysis. We propose to develop a visualization framework that can be used for scRNA-seq data from tens of thousands of cells in real time. As a starting point, we can use the Python interactive visualization library bokeh to display scRNA-seq data. While the goal is to design a framework that is compatible with scRNA-seq data from most technologies, we will use scRNA-seq data from 10x Genomics as a demonstration dataset. It includes a 68k scRNA-seq data from peripheral blood mononuclear cells (PBMCs) and 8k scRNA-seq data from bone marrow samples of a healthy individual and a patient with acute myeloid leukemia. Some basic functions the visualization framework should support are: 1) dynamic visualization of cells by principal components and clustering; 2) coloring of expression of specific markers in cells; 3) selection of a subset of cells; 4) identification of cluster-specific markers; and 5) visualization compatibility for multiple samples. Participants are encouraged to explore and add other functionalities. The data has been processed by the 10x single-cell pipeline, and output files (matrices, principal component and clustering outputs) have been saved in text files. Team members are expected to familiarize themselves with these output files before the hackathon. Team members should also be familiar with applications of single cell RNA-seq. In addition, team members should become familiar with bokeh. The proposed schedule is to sketch out an interface and functionalities on Day 1. Day 2 and part of Day 3 will be used for implementation and evaluation, and end of Day 3 is used for demo.

Project Lead: Grace Zheng / @gracezheng / Industry Professional / 10x Genomics

sjackman commented 8 years ago

We're planning to have a Docker image with a bunch of bioinformatics software preinstalled running on machines at the BC Cancer Agency Genome Sciences Centre during the Hackathon. Which bioinformatics software do you plant to use for your project? In particular, is there any software that you plan to use that is not already listed here? http://www.bcgsc.ca/services/orca

oneillkza commented 8 years ago

Is this project wedded to PCA? Both viSNE and SPADE provide more sophisticated dimensionality reduction methods that have been applied with success to single-cell phenotypic data. The main benefit for visualisation is that these methods target two-dimensional space (whereas PCA can have half a dozen important components or more).

The SPADE code seems to be languishing a little, but t-SNE, the algorithm underlying viSNE, is widely implemented and could easily be applied to this data.

oneillkza commented 8 years ago

For stretch goals, overlaying some functional information (e.g. GO terms or Reactome pathways) would be neat. A simple way to do this would be to allow the user to specify a term of interest. A more sophisticated way would be to do functional enrichment analysis and visualise the results of that.

oneillkza commented 8 years ago

Another question: is the idea behind this project to precompute the clustering and dimensionality reduction? I feel like this would have significant benefits in terms of responsiveness and resource use of the interface. If it would be desired to tweak the number of clusters, this could also be precomputed (up to some reasonably high value for K, e.g. 100); the files would still be pretty small.

gracezheng commented 8 years ago

Yes, the idea is to precompute the clustering and dimensionality reduction etc. 10x single cell pipeline does that (http://software.10xgenomics.com/single-cell/pipelines/latest/what-is-cell-ranger). What we really want to focus on here is to create an interactive tool so that scientists can focus on the discovery instead of the technical details such as manipulating matrices.

Overlaying functionalities such as GO or pathway analysis will be nice, although everyone has their own preferences when it comes to such analysis. We can definitely explore this idea!

oneillkza commented 8 years ago

Sounds good!

It might be worth thinking about cell trajectory inference (e.g. one of Wishbone/Monocle/SCUBA). I'm pretty sure that could also be pre-computed and visualised using bokeh.

Grace, are you at ISMB? Unfortunately I missed the 10X dinner last night, but it'd be great to chat if you're around!

gracezheng commented 8 years ago

Hi Kieran, I'm not at ISMB. But my colleague, Paul Ryvkin, who also works on single cell project is attending the conference. Feel free to reach out to him at paul.ryvkin@10xgenomics.com

oneillkza commented 8 years ago

Thanks, will do!

qfwills commented 7 years ago

Hi Grace

You've probably noticed one or two mature single-cell processing/visual exploratory packages emerging: https://www.bioconductor.org/help/workflows/simpleSingleCell/ Scater (https://github.com/davismcc/scater) uses some of the most established data standards, and will have full (wrapper) ability to do pseudoalignments direct from sequencer with the next bioconductor release. The preprint is here: http://biorxiv.org/content/early/2016/08/15/069633. If your 10X and visialisation workflows fit with something like scater I suspect you'll see very quick adoption and community contribution beyond hackseq.

Q

gracezheng commented 7 years ago

Hi Quin,

Thanks for the message. I have noticed Scater. We will test it with our own datasets after ASHG, and will let you know.

Grace

qfwills commented 7 years ago

Awesome. Just shout... I'm actually at LSI until Dec. Would love to hear how you guys get along. The popular exploratory visualisations are already in Scater, and I'm sure the interest is there to better plug into HTP platforms like 10x. See you, Mike, Tarjei and team at ASHG.

Q

Quin Wills www.devilontwosticks.org