nolanlab / scaffold

scaffold
18 stars 8 forks source link

---> Scaffold has moved! <---

There is a new and much improved version of this pipeline that has been split into 3 seperate packages and can be found here

The main differences are as follows:


SCAFFoLD

Installation

Install a C++ compiler

You need to have a working C++ compiler to install SCAFFoLD. Please refer to the following steps for installing a compiler on your system

Mac OSX

You need to install the XCode software from Apple that is freely available on the App Store. Depending on the specific version of XCode you are using you might also need to install the "Command Line Tools" package separately. Please refer to the Documentation for your XCode version

Windows

Install the Rtools package, which is required for building R packaged from sources

Linux

Install GCC. Refer to the documentation of your distribution to find the specific package name

Install required R packages

You need to install the devtools package, available from CRAN, and the flowCore package from Bioconductor. The rest of the dependencies for SCAFFoLD will be automatically installed

Devtools

Open an R session, type the following command and select a CRAN mirror when prompted.

install.packages("devtools")

FlowCore

Open an R session and type the following commands

source("http://bioconductor.org/biocLite.R")
biocLite("flowCore")

Install SCAFFoLD

Once you have succesfully completed the steps above, start an R session and type the following commands

library(devtools)
install_github("nolanlab/scaffold")

This will install the SCAFFoLD R package together with all the required dependencies. If evertyhing was successful you should be able to start SCAFFoLD by typing the following commands

library(scaffold)
scaffold.run()

to stop SCAFFoLD simply hit the "ESC" key in your R session.

Note: the latest version of devtools seems to be occasionally having problems installing dependencies on windows. If the installation of SCAFFoLD fails for a missing package, please install the offending packages manually, using the R install.packages function

Usage

When you launch the GUI you will be prompted to select a file. You can select any file in what you want to be your working directory and this will set the working directory for the remainder of the session. SCAFFoLD will only look at files in your working directory, so everything you need must be there. Also if you add files to this directory you will need to restart the interface in order to see them in the dropdown menus. The first step of the analysis is to cluster the FCS files.

Clustering

Select the "Run clustering" tab from the navigation bar at the top. In the clustering tab select a representative FCS file and then select the markers that you want to use for the clustering. Hit start clustering and wait for the procedure to complete. For each FCS files two files will be created:

  1. your-fcs-file.clustered.txt: this file contains the marker medians for each cluster
  2. your-fcs-file.clustered.all_events.RData: this file is an RData object which contains all the events in the original FCS file but with an added column that specifies the cluster membership. The data in this file is arcsinh transformed

The clustering is the only computationally intensive part of a SCAFFoLD analysis. Luckily this only needs to be run once as you can simply reuse these files to build multiple maps

Construct a SCAFFoLD map

Switch to the "Run SCAFFoLD Analysis" tab by using the top navigation bar. Using the first drop-down menu select the dataset that will act as the reference (The menu will only contain .clustered.txt files that are located in the current working directory). After you have chosen the markers that you want to use for the analysis select Gated as the running mode. This will use any number of gated populations as landmark nodes in the graph (Red nodes). The position of the landmark nodes will be constant across all the graphs you generate and will provide a visual reference that will allow you to compare the different datasets across each other.

The gated populations have to be provided as single FCS files (one for each population) that need to be located in a subdirectory called "gated" of the current working directory. The program will split the name of the FCS file using "_" as separator and the last field will be used as the population name. For instance if you want an FCS file to define your "B cells" population you have to use the following naming scheme:

WhateverYouWant_B cells.fcs

This is a rundown of the different options:

After you have specified all the parameters you can click on the "Start analysis" button. The run should be pretty quick and it will create a single .scaffold file with the same name of the dataset that you have used as reference. This is a single self-contained bundle that contains everything you need to browse the data. You can move it in any folder you want and also share with other users, without having to share any of the original files.

Explore a SCAFFoLD map

Switch to the "Map exploration" tab by using the top navigation bar. This is a rundown of the operation of the different controls:

You can interact with the graph using the mouse as follows (node selections are used for plotting and exporting data, see below):

The table to the right of the graph shows statistics about either the entire graph, or the currently selected nodes. In the former case, the table shows statistics related to the number of cells for which each landmark in the Landmark column is the closest landmark. Conversely, when one or more nodes are selected, the table shows statistics related to the individual clusters.

One of the most useful ways to inspect a cluster is to plot the expression values for the cells that comprise the cluster as compared to the cells that define the landmark nodes the cluster is connected to. This can help you understand what is similar and what is different between a cluster and a landmark population. The plot generated with the options below will therefore contain all the selected clusters, and all the landmarks these clusters are connected to.

Map data onto an existing reference

Instead of starting a new map from scratch, you can map a set of clustered files onto an already existing scaffold analysis. This will generate a map with the same layout and the same landmarks as the original analysis. In order to do this select a reference .scaffold file from the left dropdown, and one of the sample clustered files from the right one (all files need to be located in the current working directory).

You can then use the two boxes to select the markers to be included in the mapping. Markers will appear in the gray area below, in the order in which you have chosen them.

Important: the markers will be mapped between the two datasets in the order displayed in the two gray boxes. The first marker in the right box, will be considered equivalent to the first marker in the left box, the second to the second and so on. It is therefore extremely important that the markers appear in the correct order in the gray boxes. You can drag and drop on the markers to rearrange the order, or simply input them in a different order to begin with (the latter is usually more convenient).

This is a rundown of the two options:

When you hit Start analysis, a new .scaffold file will be created in the working directory, containing the result of mapping the new data onto the existing reference dataset.