jyxi7676 / STtools

5 stars 7 forks source link

Spatial Transcriptomic Tools (STtools)

STtools is a software package that is designed to process spatial transciriptomics (ST) data from various platforms including Seq-Scope, SlideSeq, and VISIUM. The STTools pipeline includes preprocessing of raw sequence reads, alignment, collapsing barcodes into grids, clustering cell types, and high-resolution analysis with sliding window strategy. STTools leverages many existing software tools for single-cell and spatial transcriptomic analysis, such as STARSolo, Seurat, BayesSpace, and Seqtk.

Getting Started

We recommend running STTools in a linux operating system (e.g. Ubuntu 18.04). See Installtion for required software tools to run STTools.

## clone the repository
git clone https://github.com/jyxi7676/STtools.git
cd STtools
## install required python packages
python -m pip install -r requirements.txt
## download example data and decompress
gdown https://drive.google.com/uc?id=1e0u57Yu_fVKFvs-UA7WYfj-vgm8Nd2y4
unzip STtools_example_data.zip 
## create output directory and set environment variables
mkdir out
export STHOME=$(pwd)
export STDATA=$STHOME/STtools_example_data ## directory containing data
export STOUT=$STHOME/out             ## output directory
export SEQTKPATH=/path/to/seqtk/bin  ## path that contains seqtk binary
export STARPATH=/path/to/STAR/bin    ## path that contains STAR binary
export GENOMEINDEX=/path/to/STAR/index ## path that contains STAR index
## UNCOMMENT if you need to build STAR index yourself for the example data,
## mkdir -p $STHOME/STtools_example_data/geneIndex/STARIndex
## $STARPATH/STAR --runThreadN 6 --runMode genomeGenerate --genomeDir $STHOME/STtools_example_data/geneIndex/STARIndex \
##     --genomeFastaFiles $STHOME/STtools_example_data/geneIndex/mm10.fasta \
##     --sjdbGTFfile $STHOME/STtools_example_data/geneIndex/mm10.gtf --sjdbOverhang 99
## export GENOMEINDEX=$STDATA/geneIndex/STARIndex/
## 
## Run STTools - step A1 to V1
python3 $STHOME/sttools.py --run-all --STtools $STHOME \
  --first-fq $STDATA/stepA_extractCoordinates/liver-MiSeq-tile2106-sub-R1.fastq.gz \
  --second-fq1 $STDATA/stepA_align/liver_tile2106_sub_R1.fastq.gz \
  --second-fq2 $STDATA/stepA_align/liver_tile2106_sub_R2.fastq.gz \
  --outdir $STOUT --genome $GENOMEINDEX --star-path $STARPATH --seqtk-path $SEQTKPATH \
  --seqscope1st 'HiSeq' --clustering False --lane-tiles 1_2106 \
  --binsize 300 --window 150 -l 20 -o 'Sample' -c 2

STtools package have flexible options for the user to run all steps, specificn steps, or consecutive steps. Several examples from various scenarios are given below for illustratrion.

Overview of STtools

This image below illustrates the overall workflow for STtools.

There are 7 steps in total. Each step takes input from either the raw data or outputs of the previous steps. Please see a brief explanation on each step:

Installation

Linux operatin system is necessary to run STtools package. You also need to install the following software tools and librares/modules before using this package.

To install STtools, please run:

git clone https://github.com/jyxi7676/STtools.git

Example Data

Input Data Format

Please refer to data formats for an illustration of required input data format for each step.

External links

Here are some useful external links: