mdibl/scscape - Githubissues

Introduction

nf-core/scscape is a bioinformatics pipeline that was built for multi-sample single cell analysis downstream from the generation of count matrices. The pipeline operates using many functional components derived from the Seurat R package. Input data is expected to be in the format of barcodes, features, and matrix files. Output includes Seurat objects that contain QC metrics, identified cell clusters, and dimensionally reduced projections that encompass the experiments gene expression variability.

Gzip all raw input files for consistency
Initialize seurat object for each sample
Normalize gene expression counts & perform mitochondrial / cell-cycle scoring
Detect and remove suspected doublets from each sample
Merge - normalize - find variable features - scale data (SCTransform)
Run principal component analysis
Perform integration to remove technical confounding variables
Find k nearest-neighbors & cluster (Louvain)
Dimensionally reduce expression variance and plot

Documentation

The nf-core/scscape pipeline comes with documentation about the pipeline usage, parameters, and output.

scscape workflow

Usage

Note: If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

Configuration

First, prepare a sample sheet with your input data that looks as follows:

Samples.csv:

id,data_directory,mt_cc_rm_genes
00_dpa_1,/filtered_feature_bc_matrix/,AuxillaryGeneList.csv

Each row represents a samples matrix files (barcodes.tsv, features.tsv, matrix.mtx) and associated genes used in the analysis.

Second, add mitochondrial, S phase, G2 / M phase, removal genes

AuxillaryGeneList.csv:

MTgenes,G2Mgenes,Sgenes,RMgenes
mt-nd1,hmgb2a,mcm5,
mt-nd2,cdk1,pcna,

Finally, construct a segmentation file defining the analysis groups for the experiment (ex: treatment, rep, age, sex).

segmentation.csv:

id,00_dpa,04_dpa,all
00_dpa_1,true,false,true
00_dpa_2,true,false,true
04_dpa_1,false,true,true
04_dpa_2,false,true,true

Make sure id columns match between segmentation.csv & Samples.csv

Now, you can run the pipeline using:

nextflow run nf-core/scscape \
   -profile docker \
   -params-file paramaters.json \
   -c custom.config

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Note: There is the ability to create a .loupe file within the configuration options of this pipeline. This file can be used with the 10x Loupe Browser to interactively explore your single cell experiment. In order to successfully generate the file, you are required by 10x to both read the 10x End User License Agreement and accept their terms by setting the eula_agreement parameter to Agree (in addition to setting makeLoupe to true).

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/scscape was originally written by Ryan Seaman, Riley Grindle, Joel Graber.

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #scscape channel (you can join with this invite).

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

mdibl / scscape

readme