parmejohn / scRNA-pipeline-UofM

scRNA-seq Pipeline for Drogemoller and Wright Labs
GNU General Public License v3.0
0 stars 2 forks source link

scRNA-seq Analysis Pipeline

Installation

Dependencies

Script and container download

Script

Request has to be asked beforehand, or can be run on the server under FOLDERNAME

git clone https://github.com/parmejohn/scRNA-pipeline-UofM.git

Apptainer (Singularity)

Image can be downloaded from Sylabs or pulled using

apptainer pull --arch amd64 library://parmejohn/uofm/scrnaseq_singularity:latest

Apptainer and singularity is the same tool, but apptainer is the new naming scheme. Also to note, when using apptainer, library may not be set up which could lead to pull errors if using the command above.

Usage

Run the script from the downloaded directory

nextflow run scRNA_pipeline.nf \
    --indir <input folder to sorted cellranger counts> \
    --outdir <output folder> \
    --species <species name> \
    --bind <bind file path locations> \
    --reference_seurat [path(s) to labelled reference seurat object] \
    --beginning_cluster [inferred earliest celltype] \
    --clusters_optimal [optimal number of clusters] \
    --resolution [0-9*] \
    --run_escape [true/false] \
    --pathways [PATHWAY1,PATHWAY2,...] \
    --run_sling [true/false] \
    --co_conditions [X1,X2,...] \
    --reduced_dim [integrated.cca | integrated.mnn | harmony] \
    --test_data [0-9*] \
    -with-apptainer path_to_image/scrnaseq_singularity_latest.sif

Arguments

Outputs

Analysis folder:

analysis/
├── data
│   ├── optimal_clusters.txt
│   ├── qc
│   │   ├── CONDITION1
│   │   │   ├── [sample_1]_soupx
│   │   │   │   ├── barcodes.tsv
│   │   │   │   ├── genes.tsv
│   │   │   │   └── matrix.mtx
│   │   │   └── ...
│   │   └── ...
│   ├── sce_slingshot.rds
│   ├── sc_integrated_milo_traj.rds
│   ├── se_filtered_list.rds
│   ├── se_filtered_singlets_list.rds
│   ├── se_integrated_auto_label.rds
│   ├── se_integrated_dimred.rds
│   ├── se_integrated_escape_norm.rds
│   ├── se_integrated_escape.rds
│   ├── se_integrated.rds
│   ├── se_list_raw.rds
│   ├── se_markers_presto_integrated.txt
│   └── ti
│       ├── ti_gene_clusters_slingPseudotime_\*.txt
│       ├── ...
└── plots
    ├── conserved_marker_unlabelled.pdf
    ├── da
    │   ├── milo_DA_DE_heatmap_\*.pdf
    │   ├── milo_DA_fc_distribution.pdf
    │   ├── milo_DA_umap.pdf
    │   ├── milo_pval_distribution.pdf
    │   └── milo_volcano_plot.pdf
    ├── deseq2
    │   ├── deseq2_cluster_[CLUSTER]_[CONDITION1]_vs_[CONDITION2].pdf
    │   ├── ...
    ├── gsea
    │   ├── comparative
    │   │   ├── gsea_cluster_[CLUSTER]_[CONDITION1]_vs_[CONDITION2].pdf
    │   │   ├── ...
    │   └── escape
    │       ├── escape_heatmap_top5.pdf
    │       ├── [CLUSTER]
    │       │   ├── GEYSER_PLOT_[1_path].pdf
    │       │   ├──...
    │       ├── ...
    ├── integrated_elbow_plot.pdf
    ├── integrated_umap_grouped.pdf
    ├── integrated_umap_labelled.pdf
    ├── integrated_umap_split.pdf
    ├── integrated_umap_unlabelled.pdf
    ├── qc
    │   ├── [sample1]_soupx_nGenes_nUMI.pdf
    │   ├── [sample1]_soupx_percent_mt.pdf
    │   ├── ...
    ├── reference_marker_mapping_heatmap.pdf
    ├── ti
    │   ├── ti_de_slingPseudotime_*.pdf
    │   └── ti_start_smooth.pdf OR ti_no_start_not_smooth.pdf
    └── top3_markers_expr_heatmap.pdf

Data descriptions:

Plot descriptions:

Example

Cisplatin-treated and non-treated mice data.

Counts folder:

counts
├── CTRL
│   ├── pilot_study_C1 -> /home/projects/CIO/yard/run_cellranger_count/pilot_study_C1
│   └── pilot_study_C2 -> /home/projects/CIO/yard/run_cellranger_count/pilot_study_C2
└── TREAT
    ├── pilot_study_T1 -> /home/projects/CIO/yard/run_cellranger_count/pilot_study_T1
    └── pilot_study_T2 -> /home/projects/CIO/yard/run_cellranger_count/pilot_study_T2

Above I am using symlinks to save space (the symlink path also needs to be included in the binded paths). Ensure that REAL paths are used, an error will occur if attempting to use a symlink path name since apptainer/singularity does not run with root access and needs to be explicitly told where the files can be found.

nextflow run scRNA_pipeline.nf \
    --indir /home/projects/sc_pipelines/counts/ \
    --outdir /home/projects/sc_pipelines/test_run_nf_1 \
    --species musmusculus \
    --bind /home/projects/,/home/projects/CIO/yard/run_cellranger_count \
    --reference_seurat /home/projects/sc_pipelines/analysis/data/se_michalski.rds \
    --beginning_cluster Osteoblasts \ 
    -with-apptainer /home/phamj7@med.umanitoba.ca/bin/scrnaseq_singularity.sif

References

Bioinformatics Tools