romanhaa / Cerebro

Visualization of scRNA-seq data.
MIT License
94 stars 19 forks source link

License: MIT Lifecycle: retired Twitter

:warning: Discontinuation notice: Sadly, Cerebro and cerebroApp are no longer in active development. See here for more info.

Cerebro

Table of Contents

Screenshot Cerebro: overview panel

This is the standalone version of Cerebro, cell report browser, (currently available for macOS and Windows) which allows users to interactively visualize various parts of single cell transcriptomics data without requiring bioinformatic expertise.

The core of Cerebro is the cerebroApp Shiny application which is bottled into a standalone app using Electron. Therefore, it can also be run on web servers and Linux machines, requiring only R and a set of dependencies.

Input data needs to be prepared using the cerebroApp R package which was built specifically for this purpose. It offers functionality to export a Seurat object (both v2 and v3 are supported) to the correct format in a single step. The file should be saved either with the .crb or .rds extension, indicating that internally it is an RDS object. Furthermore, the cerebroApp package also provides functions to perform a set of (optional) analyses, e.g. gene set enrichment analysis, pathway enrichment analysis based on marker gene lists of groups of cells, and more.

The exported .crb file is then loaded into Cerebro and shows all available information.

Key features:

Basic examples for Seurat v2 and v3 and scanpy workflows and subsequent exporting can be found in the examples folder. There you can also find the raw data and the output file that can be loaded into Cerebro.

Further screenshots can be found in the screenshots folder.

Introduction to the Cerebro interface

Below you find a brief description of what each panel of the Cerebro interface shows.

For more detailed description, written for biologists without computational expertise, head over here.

Load data

Select input file (.rds or .crb). Shows number of cells, samples, clusters, as well as experiment name and organism.

Overview

Shows 2D and 3D dimensional reductions. Cells can be colored by meta data variables, automatically coloring the cells using a categorical or continuous scale. Cells can be randomly down-sampled to improve performance.

Samples

Shows sample-centric perspective of data.

Clusters

Shows cluster-centric perspective of data. See info about Samples panel above for more details.

Most expressed genes

If computed in cerebroApp, provides tables of most expressed genes by sample and cluster.

Marker genes

If computed in cerebroApp, provides tables of marker genes by sample and cluster.

Enriched pathways

If computed in cerebroApp, provides tables of enriched pathways in marker gene lists of samples and clusters.

Gene expression

Allows to show the expression of specified genes (showing the average per cell if multiple genes) in the data set. Calculation is triggered after pressing SPACE or ENTER. Multiple genes must be submitted in separate lines or separated by either space, comma, semicolon. Shows which genes are available or missing (or misspelled) in data set. Expression levels are shown in dimensional reductions and as violin plots for every sample and cluster. Average expression across all cells of the 50 most expressed genes (of the ones specified by the user) are shown as well to quickly spot which genes drive the color scale.

Gene set expression

Basically the same as the gene expression panel except that it allows to select gene sets from MSigDB (requires internet connection). Only available for human and mouse data.

Trajectory

This tab gives access to trajectory information, if data is available. Currently, we support trajectories generated by Monocle v2 which can extracted through cerebroApp::extractMonocleTrajectory(). Multiple trajectories can be added to a single Seurat object so the user here needs to choose which of those available to visualize. Several interactive plots will be shown, including dimensional reduction, distribution of categorial variables along pseudotime, composition of transcriptional states by sample, cluster, as well as distribution of transcript counts and number of expressed genes by state.

Gene ID conversion

Provides table that allows to convert gene IDs and names. Includes GENCODE identifier, ENSEMBL identifier, HAVANA identifier, gene symbol and gene type. Only available for mouse and human. Based on GENCODE annotation version M16 (mouse) and version 27 (human).

Analysis info

Overview of parameters that were used during the analysis, as long as they were provided. Also shows list of mitochondrial and ribosomal genes present in the data set if computed with cerebroApp.

Motivation

Single cell RNA-sequencing data is rich and complex. Allowing experimental biologists to explore the results is beneficial for the iterative scientific process of performing analysis and deriving conclusions. Cerebro provides an easy way to access the data without any bioinformatic expertise.

Installation

For people without any experience in using the command line, getting access to Cerebro is probably easiest by downloading Cerebro for your OS from here, then unpacking and launching it. Currently, Cerebro is available only for macOS and Windows.

More experienced users of all platforms can alternatively launch the app through the dedicated cerebroApp R package - which is the core Cerebro - or the romanhaa/cerebro Docker container.

Please check the image and table below for an overview of the supported operating systems and requirements of each way to start Cerebo.

Options to launch Cerebro.

Standalone desktop application cerebroApp R package Docker container
Link Releases GitHub Docker Hub
Supported OS macOS, Windows macOS, Windows, Linux macOS, Windows, Linux
(not all tested)
Requirements - R (3.5.1 or higher) Docker client
Installation Download current release from GitHub repository Through BiocManager::install() Pull container from Docker Hub
Launch Cerebro Double-click executable Inside R Start container

Details: cerebroApp R package

Requirements: R (version 3.5.1 or higher)

A convenient IDE would be RStudio but it can be done from any R session. Make sure to install cerebroApp using BiocManager::install() to get the most recent version of dependencies on Bioconductor.

BiocManager::install("romanhaa/cerebroApp")
cerebroApp::launchCerebro()

Details: romanhaa/cerebro Docker container

Requirements: Docker client

docker pull romanhaa/cerebro:latest
docker run -p 8080:8080 -v <export_folder>:/plots romanhaa/cerebro
# for example
docker run -p 8080:8080 -v ~/Desktop:/plots romanhaa/cerebro

Then, in your browser you navigate to the address printed in the terminal, e.g. 127.0.0.1:8080.

Note 1: Binding a local directory with -v <export_folder>:/plots is only necessary if you want to export dimensional reductions from Cerebro.

Note 2: If you need to change the port, you can do that like this:

docker run -p <port_of_choice>:8080 -v <export_folder>:/plots romanhaa/cerebro
# OR
docker run -p <port_of_choice>:<port_of_choice> -v <export_folder>:/plots romanhaa/cerebro Rscript -e 'shiny::runApp(cerebroApp::launchCerebro(), port=<port_of_choice>, host="0.0.0.0", launch.browser=FALSE)'

Example data sets

We provide documentation and commands for the following example data sets:

Conversion of other single cell data formats

Currently, the cerebroApp R package only provides a functions to export a Seurat (v2 or v3) object to the Cerebro input file. However, there are a few other important single cell data storage formats, e.g. AnnData (used by scanpy, SingleCellExperiment (used by scran and scater), and CellDataSet (used by Monocle).

We believe using the existing network of conversion/exporting functions is more efficient than creating a dedicated export function for scanpy data. To highlight how data processed with scanpy (stored in AnnData format) can be prepared for loading into Cerebro, we have prepared a scanpy-based workflow for the pbmc_10k_v3 example data set.

In the figure below, we highlight how you can generate the Cerebro input file from any of the four major formats.

Single cell data formats

Technical notes

Building from source

On macOS

To package Cerebro you need Git and Node.js (which comes with npm) installed on your computer. Then, from the command line, run:

# clone this repository
git clone https://gitlab.com/romanhaa/Cerebro.git
# install Electron packager
npm install electron-packager --global
# go into the repository
cd Cerebro
# install dependencies
npm install
# run the app
npm start
# build the app
npm run package-mac

To build the Windows version under macOS it is necessary to install Wine. I experienced problems with missing libraries of the stable version (4.0) so I recommend to use the developers version (4.4) using Homebrew:

brew tap caskroom/versions
brew update
brew install caskroom/versions/wine-devel
npm run package-win

On Windows

If you're using Linux Bash for Windows, see this guide or use node from the command prompt.

Troubleshooting

Credits

Contribute

To report any bugs, submit patches, or request new features, please log an issue through the issue tracker. For direct inquiries, please send an email to roman.hillje@ieo.it.

Citation

If you used Cerebro for your research, please cite the following publication:

Roman Hillje, Pier Giuseppe Pelicci, Lucilla Luzi, Cerebro: interactive visualization of scRNA-seq data, Bioinformatics, btz877, https://doi.org/10.1093/bioinformatics/btz877

License

Copyright (c) 2019 Roman Hillje

The MIT License (MIT)