Or you can use msPIPE on docker without having to prepare the environment. \< Recommended>
:point_right: HOW TO USE msPIPE on docker
git clone https://github.com/jkimlab/msPIPE.git
/PATH/TO/msPIPE/msPIPE.py -p params.conf -o OUTDIR
The parameter file must contain information necessary for pipeline execution.
params_format.conf
INPUT PARAMETER FILE FORMAT
[DMR]
ANALYSIS1 = sample1, sample2 (Two sample names for DMR analysis)
ANALYSIS2 = sample1, sample3 (Two sample names for DMR analysis)[REFERENCE]
UCSC_NAME = UCSC reference version name
FASTA = [path to reference fasta file (not required)]
GTF= [path to reference gtf file (not required)][LIB1]
SAMPLE_NAME = sample name
LIB_NAME = library name
LIB_TYPE = P or S (Paired-End or Single-Read)
FILE_1 = [path to sequencing read file]
FILE_2 = [path to sequencing read file]
usage: msPIPE.py [-h] --param params.conf --out PATH [--core int] [--qvalue float] [--skip_trimming] [--program bismark or bs2]
[--bsmooth] [--skip_mapping] [--skip_calling] [--calling_data PATH] [--skip_GMA] [--skip_DMR]
optional arguments:
-h, --help show this help message and exit
--param params.conf, -p params.conf
config format parameter file
--out PATH, -o PATH output directory
--core int, -c int core (default:5)
--qvalue float, -q float
q-value cutoff (default:0.5)
--skip_trimming skip the trimgalore trimming
--program bismark or bs2
program option for mapping & calling
--bsmooth use bsmooth for DMR analysis
--skip_mapping skip the bismark mapping
--skip_calling skip the methylation calling
--calling_data PATH, -m PATH
methylCALL directory
--skip_GMA skip the Gene-Methyl analysis
--skip_DMR skip the DMR analysis
You can choose the software to be used for analysis by selecting the module.
WGBS reads mapping & calling (--program bismark or bs2).
Use this option to select the tools to be used for mapping and calling of wgbs reads.
--program bismark : use Bismark with bowtie2 (default)
--program bs2 : use BS-Seeker2 with bowtie
Differential methylation analysis (--bsmooth)
With this option, BSmooth is used to analyze instead of methylKit and inhouse script.
You can leave out some pipeline steps with the --skip_\<STEP> option.
The main steps of the entire pipeline and the steps that can be omitted are as follows.
params_format.conf
INPUT WITH SKIP OPTIONS
...
[LIB1]
SAMPLE_NAME = sample name
LIB_NAME = library name
LIB_TYPE = P or S (Paired-End or Single-Read)
TRIMMED_FILE_1 = [path to preprocessed read file (with --skip_trimming option)]
TRIMMED_FILE_2 = [path to preprocessed read file (with --skip_trimming option)][LIB2]
SAMPLE_NAME = sample name
LIB_NAME = library name
LIB_TYPE = P or S (Paired-End or Single-Read)
BAM_FILE = [path to bismark mapping file (with --skip_mapping option)]
Running example using mouse rod WGBS data from Corso-Díaz, Ximena et al.
GEO accession | sample-24M | sample-3M |
---|---|---|
GSE134873 | rep1, rep2, rep3 | rep1, rep2, rep3 |
params_mouse.conf
Replace the '/PATH/TO/DATA' with a data path on your local server.
[DMR]
ANALYSIS1 = 24M, 3M
[REFERENCE]
UCSC_NAME = mm10
[LIB1]
SAMPLE_NAME = 24M
LIB_NAME = 24M_rep1
LIB_TYPE = P
FILE_1 = /PATH/TO/DATA/SRX6589858_1.fastq.gz
FILE_2 = /PATH/TO/DATA/SRX6589858_2.fastq.gz
[LIB2]
SAMPLE_NAME = 24M
LIB_NAME = 24M_rep2
LIB_TYPE = P
FILE_1 = /PATH/TO/DATA/SRX6589859_1.fastq.gz
FILE_2 = /PATH/TO/DATA/SRX6589859_2.fastq.gz
[LIB3]
SAMPLE_NAME = 24M
LIB_NAME = 24M_rep3
LIB_TYPE = P
FILE_1 = /PATH/TO/DATA/SRX6589860_1.fastq.gz
FILE_2 = /PATH/TO/DATA/SRX6589860_2.fastq.gz
[LIB4]
SAMPLE_NAME = 3M
LIB_NAME = 3M_rep1
LIB_TYPE = P
FILE_1 = /PATH/TO/DATA/SRX6589850_1.fastq.gz
FILE_2 = /PATH/TO/DATA/SRX6589850_2.fastq.gz
[LIB5]
SAMPLE_NAME = 3M
LIB_NAME = 3M_rep2
LIB_TYPE = P
FILE_1 = /PATH/TO/DATA/SRX6589851_1.fastq.gz
FILE_2 = /PATH/TO/DATA/SRX6589851_2.fastq.gz
[LIB6]
SAMPLE_NAME = 3M
LIB_NAME = 3M_rep3
LIB_TYPE = P
FILE_1 = /PATH/TO/DATA/SRX6589852_1.fastq.gz
FILE_2 = /PATH/TO/DATA/SRX6589852_2.fastq.gz
Running command
./msPIPE/msPIPE.py -p params_mouse.conf -o mouse_result -c 5 -q 0.5
All output created by msPIPE will be written to the methylCALL
and Analysis
directories in the given output
directory.
The output of pre-processing (read files processed by trimming and quality control), alignment, and methylation calling for each input library (named with LIB_NAME in config file) will be in methylCALL
directory.
The output of methylation analysis will be in Analysis
directory.
An example output structure of Analysis
directory
[Analysis]
|- avg_methlevel.pdf
|- [annotations]
|- [sample1]
|- ALL_TEXTFILES_AND_PLOTS_FOR_SAMPLE1
|- [AroundTSS]
|- [MethylSeekR]
|- [sample2]
|- ...
|- [DMR]
|- [sample1.sample2]
Output files and directories in Analysis
annotations
: a directory with information of genes, exons, introns, promoters, and intergenic regions in BED format filessample1
: a directory with all results of methylation analysis for sample1
AroundTSS
/meth_lv_3M.txt : for each gene, average methylation levels in bins around TSS (+/- 1500 bp)MethylSeekR
: a directory with all results for running MethylSeekRIf DMC/DMR analysis is performed, DMR
directory will be created in Analysis
directory. When the DMC/DMR analysis is performed, GO enrichment test will be carried out for the gene set with DMCs or DMRs in their promoter.
Examples of output files and directories in DMR
for comparison pair sample1 and sample2
sample1.sample2
: a directory with all results of DMC/DMR analysis, in this case sample1 will be treated as control and sample2 will be treated as case
methylkit
: output of running methylKitgit clone https://github.com/jkimlab/msPIPE.git
cd msPIPE
docker build -t jkimlab/mspipe:latest .
docker pull jkimlab/mspipe:latest
params_docker.conf
[DMR]
ANALYSIS1 = 24M, 3M
[REFERENCE]
UCSC_NAME = mm10
[LIB1]
SAMPLE_NAME = 24M
LIB_NAME = 24M_rep1
LIB_TYPE = P
FILE_1 = /msPIPE/data/SRX6589858_1.fastq.gz
FILE_2 = /msPIPE/data/SRX6589858_2.fastq.gz
[LIB2]
SAMPLE_NAME = 24M
LIB_NAME = 24M_rep2
LIB_TYPE = P
FILE_1 = /msPIPE/data/SRX6589859_1.fastq.gz
FILE_2 = /msPIPE/data/SRX6589859_2.fastq.gz
...
#docker run -v [local path]:[docker path] [docker image name] [msPIPE command]
docker run -v /PATH/TO/INPUT/DATA:/msPIPE/data:ro -v /PATH/TO/REUSABLE/REFERENCE:/msPIPE/reference -v /PATH/TO/OUTDIR:/work_dir/ jkimlab/mspipe:latest msPIPE.py -p params_docker.conf -o result
Replace the '/PATH/TO/*' with a directory path on your local server.