Parse, Align and Call Hotspots from single stranded DNA reads originated from SSDS (Single Stranded Dna Sequencing)
SSDS method was originally published by Khil et al., 2012.
The objective is to map double-strand breaks (DSBs) along the genome.
In this method, chromatin is extracted from adult testes and then immunoprecipitated with an antibody against DMC1 protein, which is a meiosis-specific recombinase. DMC1 covers the single-stranded DNA resulting from the resection of double-strand breaks (DSBs). SSDS uses the ability of single-stranded DNA to form hairpins. The pipeline processes these specific data and identifies recombination hotspots.
See this review by FrΓ©dΓ©ric Baudat, Yukiko Imai and Bernard de Massy to learn more about meiotic recombination.
This pipeline is based on SSDS pipeline and SSDS call peaks pipeline by Kevin Brick, PhD (NIH), updated and adapted to IGH cluster.
See initial paper and technical paper.
The pipeline uses Nextflow > 20.04.1
Briefly, the update from SSDS pipeline version 1.8_NF included conda/mamba/singularity/docker execution profiles, input modification, callpeaks, post-processing and IDR procedure addition, and global nextflow homogeneisation.
The main steps of the pipeline include :
In details, the pipeline is composed of 26 processes :
The hotSSDS nextflow pipeline requires Nextflow DSL1 version 20.10.0, and at least one of the followingΒ :
The preferable execution profile is Singularity to ensure full portability.
Nextflow can be easily installed using conda package manager.
Downloading the pipeline will preferably require git, otherwise the pipeline can be downloaded using command-line programs for retrieving files from Internet such as wget. Nextflow can easily be installed using conda package manager [https://anaconda.org/bioconda/nextflow]. The hotSSDS pipeline will rely on the pre-existence of genome references on your system. If you need to download them, then bwa and samtools will be required as well.
Using git (recommended)
git clone https://github.com/jajclement/hotSSDS.git
cd hotSSDS
Or download zip file using wget, then unzip file
wget https://github.com/jajclement/hotSSDS/archive/refs/heads/master.zip
unzip hotSSDS-master.zip
mv hotSSDS-master hotSSDS
cd hotSSDS
Singularity images are used to encapsulate all required softwares and dependencies for the different steps of the pipeline. They make the pipeline portable on different systems. As they can be voluminous, they are not included in the pipeline git repository.
Prior to run the pipeline using Singularity execution profile, it is necessary to download the images from Zenodo βhotSSDS Pipeline Singularity Imagesβ open repository.
To do this, two options are available depending on whether the computing environment on which the pipeline will be executed has access to the Internet (see point a) or not (see point b).
--get_sif
The option --get_sif
allows to launch a Β« dry run Β» that will check the existence of Singularity images in the pipeline directory. If not present, the pipeline will download them. Once download is completed, the pipeline stops. It can then be run without the option --get_sif
to perfom hotSSDS analyses.
nextflow run main.nf -c conf/cluster.config \
-params-file conf/test.json \
βprofile test,<singularity|mamba|conda|docker> \
--get_sif >& get_sif_main_log.txt 2>&1
Download all the 10 .sif files from zenodo open repository at https://zenodo.org/record/7783473 and place them in hotSSDS/containers
folder so that the final repository structure is :
containers/
βββ bam-box-1.0
β βββ bam-box_1.0.sif
β βββ Dockerfile
β βββ environment.yml
βββ bigwig-box-1.0
β βββ bigwig-box-1.0.sif
β βββ Dockerfile
β βββ environment.yml
βββ frip-box-1.0
β βββ Dockerfile
β βββ environment.yml
β βββ frip-box_1.0.sif
βββ idr-box-1.0
β βββ Dockerfile
β βββ environment.yml
β βββ idr-box_1.0.sif
βββ multiqc-box-1.0
β βββ Dockerfile
β βββ environment.yml
β βββ multiqc-box_1.0.sif
βββ peak-calling-box-1.0
β βββ Dockerfile
β βββ environment.yml
β βββ peak-calling-box_1.0.sif
βββ plot-box-1.0
β βββ Dockerfile
β βββ environment.yml
β βββ plot-box_1.0.sif
βββ python-3.8
β βββ environment.yml
β βββ python-3.8.sif
βββ ssds-qc-box-1.0
β βββ Dockerfile
β βββ environment.yml
β βββ ssds-qc-box_1.0.sif
βββ trimming-box-1.0
βββ Dockerfile
βββ environment.yml
βββ trimming-box_1.0.sif
Edit hotSSDS/conf/cluster.config
file to adjust the parameters to a computing cluster.
The following sections are expected to be overwritten :
DEFAULT CLUSTER CONFIGURATION section :
PROFILES SPECIFIC PARAMETERS section :
GENOMES LOCATION section :
Write the absolute paths to reference genome(s) :
fai : absolute path to genome fai index (to create with faidx tool :
samtools faidx <ref.fasta> -o <ref.fai>
If needed, edit hotSSDS/conf/resources.config
to adjust specific process resources. To do so, edit cpus/memory/time in PROCESSES SPECIFIC RESSOURCES REQUIREMENTS section. You can also add specific computing queues to some categories.
It is important to note that many institutes have one such configuration file referenced in nf-core/configs repository that you can download and adapt to hotSSDS pipeline.
The pipeline will only process paired-end data in fastq(.gz) format. The input data must be described in an input csv file with the 6 following fields :
group,replicate,fastq_1,fastq_2,antibody,control
DMC1-chip-WT,1,/path/to/data/SRR1035576_R1.fastq.gz,/path/to/data/SRR1035576_R2.fastq.gz,antiDMC1,Input-WT
DMC1-chip-WT,2,/path/to/data/SRR1035577_R1.fastq.gz,/path/to/data/SRR1035577_R2.fastq.gz,antiDMC1,Input-WT
DMC1-chip-KO,1,/path/to/data/SRR1035578_R1.fastq.gz,/path/to/data/SRR1035578_R2.fastq.gz,antiDMC1,Input-KO
DMC1-chip-KO,2,/path/to/data/SRR1035579_R1.fastq.gz,/path/to/data/SRR1035579_R2.fastq.gz,antiDMC1,Input-KO
Input-WT,1,/path/to/data/SRR1035580_R1.fastq.gz,/path/to/data/SRR1035580_R2.fastq.gz,,
Input-KO,1,/path/to/data/SRR1035581_R1.fastq.gz,/path/to/data/SRR1035581_R2.fastq.gz,,
Replicate samples must have the same "group" ID ; the same "antibody" and the same control group.
If the samples do not have an associated input control sample, leave the last fields empty.
Control (input) samples must have the 2 last fields ("antibody" and "control") empty.
There must be no empty line at the end of the csv file
The complete list of parameters is accessible through the command :
nextflow run main.nf --help
N E X T F L O W ~ version 20.10.0
Launching `main.nf` [reverent_perlman] - revision: 5244f2a6f5
=============================================================================
hotSSDS pipeline version 2.0 : Align, parse and call hotspots from SSDNA
=============================================================================
Usage:
nextflow run main.nf -c conf/cluster.config --params_file conf/mm10.json --inputcsv tests/fastq/input.csv --name "runtest" --trim_cropR1 36 --trim_cropR2 40 --with_trimgalore -profile singularity -resume
Runs with Nextflow DSL1 v20.10.0
=============================================================================
Input data parameters:
--inputcsv FILE PATH TO INPUT CSV FILE (template and default : hotSSDS/tests/fastq/input.csv)
-params_file FILE PATH TO PARAMETERS JSON FILE (template and default : hotSSDS/conf/mm10.json)
--genomebase DIR PATH TO REFERENCE GENOMES
--genome STRING REFERENCE GENOME NAME (must correspond to an existing genome in your config file, default : "mm10")
--genomedir DIR PATH TO GENOME DIRECTORY (required if your reference genome is not present in your config file)
--genome_name STRING REFERENCE GENOME NAME (e.g ".mm10", required if your reference genome is not present in your config file)
--genome_fasta FILE PATH TO GENOME FASTA FILE WITH PREEXISTING INDEX FILES FOR BWA (required if your reference genome is not present in your config file)
--fai FILE PATH TO GENOME FAI INDEX FILE (required if your reference genome is not present in your config file)
--genome2screen STRING GENOMES TO SCREEN FOR FASTQC SCREENING (default : ['mm10','hg19','dm3','dm6','hg38','sacCer2','sacCer3'], comma separated list of genomes to screen reads for contamination, names must correspond to existing genomes in your config file)
--chrsize FILE Chromosome sizes file, default : ssdsnextflowpipeline/data/mm10/mm10.chrom.sizes (downloaded from https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.chrom.sizes 2021-01-11)
--hotspots DIR PATH TO HOTSPOTS FILES DIRECTORY (set to "None" if none provided ; default : hotSSDS/data/hotspots/mm10/hotspots)
--blacklist FILE PATH TO BLACKLIST BED FILE FOR PEAK CALLING AND IDR (set to "None" if none provided ; default : hotSSDS/data/blacklist/mm10/blackList.bed)
Output and temporary directory parameters:
--name STRING ANALYSIS NAME (default : "hotSSDSPipeline")
--outdir DIR PATH TO OUTPUT DIRECTORY (default : hotSSDS/{params.name}.outdir/02_results")
--publishdir_mode STRING MODE FOR EXPORTING PROCESS OUTPUT FILES TO OUTPUT DIRECTORY (default : "copy", must be "symlink", "rellink", "link", "copy", "copyNoFollow","move", see https://www.nextflow.io/docs/latest/process.html)
Pipeline dependencies:
--src DIR PATH TO SOURCE DIRECTORY (default : hotSSDS/bin ; contains perl scripts)
--custom_bwa EXE PATH TO CUSTOM BWA EXEC (default : hotSSDS/bin/bwa_0.7.12)
--custom_bwa_ra EXE PATH TO CUSTOM BWA_SRA EXEC (default : hotSSDS/bin/bwa_ra_0.7.12)
Trimming parameters:
--with_trimgalore BOOL Use trim-galore instead of Trimmomatic for quality trimming process (default : false)
--trimgalore_adapters FILE trim-galore : PATH TO ADAPTERS FILE (default : none)
--trimg_quality INT trim-galore : minimum quality (default 10)
--trimg_stringency INT trim-galore : trimming stringency (default 6)
--trim_minlen INT trimmomatic : minimum length of reads after trimming (default 25)
--trim_cropR1 INT fastx : Cut the R1 read to that specified length (default 50)
--trim_cropR2 INT fastx : Cut the R2 read to that specified length (default 50)
--trim_slidingwin STRING trimmomatic : perform a sliding window trimming, cutting once the average quality within the window falls below a threshold (default "4:15")
--trim_illumina_clip STRING trimmomatic : Cut adapter and other illumina-specific sequences from the read (default "2:20:10")
--trimmomatic_adapters FILE PATH TO ADAPTERS FILE FOR TRIMMOMATIC (default hotSSDS/data/TruSeq2-PE.fa, special formatting see http://www.usadellab.org/cms/?page=trimmomatic)
Mapping parameters:
--with_multimap BOOL Keep multimapping reads from bam (default : false)
--bamPGline STRING bam header (default '@PG\tID:ssDNAPipeline2.0_PAUFFRET')
--filtering_flag INT SAM flag for filtering bam files (default : 2052 ; see https://broadinstitute.github.io/picard/explain-flags.html)
--picard_min_distance INT Picard parameter for marking duplicates (--MINIMUM_DISTANCE) : width of the window to search for duplicates of a given alignment, default : -1 (twice the first read's read length)
--picard_optdup_distance INT Picard parameter for marking duplicates (--OPTICAL_DUPLICATE_PIXEL_DISTANCE) : The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform (HiSeq2500). For the patterned flowcell models (Novaseq 6000), 2500 is more appropriate, default : 100
--get_supp BOOL Publish bam files for supplementary aligments, default : false
Bigwig parameter:
--bigwig_profile STRING Bigwig profile using bedtools (normalization by total library size) : "T1" will produce bigwig for T1 bed files only, one per replicates ; "T12" will also produce bigwig for merged T1+T2, one per replicates ; "T1rep" will also produce T1 bigwig for merged replicates ; "T12rep" will also produce T1+T2 bigwig for merged replicates (default : "T1")
--kbrick_bigwig BOOL Compute bigwig files ; FR bigwig files and coverage plots as in original pipeline by Kevin Brick using deeptools with FPKM normalization (default : false)
--binsize INT Deeptools binsize parameter (used only if kbrick_bigwig is TRUE ; default : 50)
Peak calling parameters:
--with_control BOOL Use input control files for peak calling analysis (default : false)
--satcurve BOOL Plot saturation curve (default : false)
--sctype STRING Saturation curve type (either 'minimal', 'standard' or 'expanded' ; default : 'standard')
--reps INT Number of iterations for saturation curve (default : 3)
--bed_trimqual INT Mapping quality threshold for bed filtering (default : 30)
--macs_bw INT Macs2 callpeak bandwidth parameter (default : 1000)
--macs_slocal INT Macs2 callpeak slocal parameter (default : 5000)
--macs_extsize INT Macs2 callpeak extsize parameter (default : 800)
--macs_qv FLOAT Macs2 callpeak q-value parameter (default : 0.1)
--macs_pv FLOAT Macs2 callpeak p-value parameter, if not -1, will overrule macs_qv, see macs2 doc (default : -1)
--no_chrY BOOL Filter out chromosomeY peaks from final peak bed files (default : true)
Optional IDR analysis parameters (ENCODE procedure, see https://github.com/ENCODE-DCC/chip-seq-pipeline2) :
--with_idr BOOL Perform IDR analysis, only possible if nb_replicates=2 (default : false)
--nb_replicates INT Number of replicates per sample (default : 2)
--idr_peaktype STRING The peak file format for IDR (narrowPeak, regionPeak or broadPeak, default : "regionPeak")
--idr_setup STRING Threshold profile for idr. This will define the thresholds for true replicates, pool replicates, self replicates r1 and self replicates r2. Profile "auto" is based on ENCODE guidelines and profile "custom" allows to set custom thresholds (see parameters --idr_threshold_r1 --idr_threshold_r2 --idr_threshold_truerep and --idr_threshold_poolrep ; default : auto)
--idr_threshold_r1 FLOAT idr threshold for self replicates r1 (used if --idr_setup is "custom" only ; default : 0.05)
--idr_threshold_r2 FLOAT idr threshold for self replicates r2 (used if --idr_setup is "custom" only ; default : 0.05)
--idr_threshold_truerep FLOAT idr threshold for true replicates (used if --idr_setup is "custom" only ; default : 0.05)
--idr_threshold_poolrep FLOAT idr threshold for pooled replicates (used if --idr_setup is "custom" only ; default : 0.01)
--idr_rank INT p.value or q.value (default : p.value)
--idr_filtering_pattern STRING Regex for filtering bed files (default :"chr[1-9X]+" for mouse ; set ".*" to keep everything)
--idr_macs_qv FLOAT Macs2 callpeak q-value parameter (default : -1)
--idr_macs_pv FLOAT Macs2 callpeak p-value parameter, if not -1, will overrule macs_qv, see macs2 doc (default : 0.1)
QC parameters:
--with_ssds_multiqc BOOL RUN SSDS MULTIQC (default : true)
--multiqc_configfile FILE OPTIONAL : PATH TO MULTIQC CUSTOM CONFIG FILE (default : hotSSDS/conf/multiqc_config.yaml)
Nextflow Tower parameter:
-with-tower BOOL Enable job monitoring with Nextflow tower (https://tower.nf/)
Singularity images parameters:
--get_sif BOOL [REQUIRE INTERNET ACCESS] Check and download singularity images if necessary (if true, pipeline will stops after download. Once downloading has been done, relaunch pipeline with false ; default: false)
--url_sif URL URL TO PUBLIC SINGULARITY IMAGES REPOSITORY (default : https://zenodo.org/record/7783473/files)
Pipeline parameters can be set in two different ways :
For Mus musculus based analysis, parameter file hotSSDS/conf/mm10.json
contains default parameters that can be overwritten
One important thing to note, in Nextflow command lines, the native options are preceded with one single dash (e.g. -profile
), while parameters specific to SSDS pipeline are preceded with 2 dashes (e.g. --genome 'mm10'
).
Once you have set computing config file and arameter file, you can launch the pipeline using the following command-line :
nextflow run main.nf -c conf/cluster.config \
-params-file conf/mm10.json \
--inputcsv /path/to/input.csv \
-profile <singularity|mamba|conda|docker> \
--name "My_workflow_name" >& main_log.txt 2>&1
It is highly recommended to launch the command in a batch job on the computing cluster, as its execution will take time and computing resources. It is also recommanded to redirect the output of this main nextflow command-line to an identified log file, which will be usefull to monitor the pipeline execution.
The main parameters that need to be set are :
--inputcsv
: path to the input csv file-params_file
: json file containing list of parameters-profile
: <conda|mamba|docker|singularity>-resume
: Prevent the entire workflow to be rerun in case you need to relaunch an aborted workflow.--name
: analysis name, e.g. "SSDS_SRA5678_DMC1".--outdir
: path to output directory.--genome
: the reference genome--with_control
(true/false) : use input control files--no_multimap
(true/false) : remove multimappers from bam files--nb_replicates
: number of biological replicates you are running with (maximum 2)--bigwig_profile
: indicates which bigwig to generate (T1 ; T12 ; T1rep or T12rep)--with_idr
(true/false) : run IDR analysis (if nb_replicates=2)--satcurve
(true/false) : plot saturation curve --with_sds_multiqc
(true/false) : generate ssds qc plotsA small dataset can be used to test if the pipeline is correctly running on your system.
To do so, run :
nextflow run main.nf -c conf/cluster.config \
-params-file conf/test.json \
βprofile test,<singularity|mamba|conda|docker> >& test_main_log.txt 2>&1
This test run should approximately take 5 minutes to complete.
On completion, the end of main log test_main_log.txt should look like :
executor > pbspro (25)
[aa/ff63f4] process > check_design (input.csv) [100%] 1 of 1 β
[1f/c81a5f] process > makeScreenConfigFile (TEST_SSDS) [100%] 1 of 1 β
[f1/341fad] process > crop (TEST_IP_R1_T1) [100%] 1 of 1 β
[fe/67d328] process > trimming (TEST_IP_R1_T1) [100%] 1 of 1 β
[cb/a77283] process > bwaAlign (TEST_IP_R1_T1) [100%] 1 of 1 β
[38/e2e8c6] process > filterBam (TEST_IP_R1_T1) [100%] 1 of 1 β
[af/05397c] process > parseITRs (TEST_IP_R1_T1) [100%] 1 of 1 β
[d1/b74fbc] process > makeBigwig (TEST_IP_R1_T1) [100%] 1 of 1 β
[1f/256f21] process > shufBEDs (TEST_IP_R1) [100%] 1 of 1 β
[3c/f2b936] process > callPeaks (TEST_IP_R1) [100%] 5 of 5 β
[43/655990] process > samStats (TEST_IP_R1_T1) [100%] 5 of 5 β
[18/200309] process > makeSSreport (TEST_IP_R1_T1) [100%] 1 of 1 β
[8f/19ef51] process > makeFingerPrint (TEST_SSDS) [100%] 1 of 1 β
[c8/a77ede] process > ssds_multiqc (TEST_IP_R1_T1) [100%] 1 of 1 β
[85/6daa72] process > normalizePeaks (TEST_IP_R1) [100%] 1 of 1 β
[6a/122837] process > makeSatCurve (TEST_SSDS) [100%] 1 of 1 β
[a7/a00c9c] process > general_multiqc (TEST_SSDS) [100%] 1 of 1 β
Completed at: 10-Mar-2023 13:37:48
Duration : 3m 57s
CPU hours : 0.3
Succeeded : 25
-with-tower
option to monitor your jobs through nextflow tower web interface.tail βd main_log.txt
executor > pbspro (1)
[21/3786f7] process > check_design (Nore_input_fi... [100%] 1 of 1 β
[57/bc36de] process > makeScreenConfigFile (SSDS_... [100%] 1 of 1 β
[d3/c42056] process > crop (WT_R2_T1) [100%] 4 of 4 β
[7d/c06fe8] process > trimming (WT_R2_T1) [100%] 4 of 4 β
[ca/74adfd] process > bwaAlign (WT_R2_T1) [100%] 4 of 4 β
[f0/dec2f6] process > filterBam (WT_R2_T1) [100%] 4 of 4 β
[aa/39f5f4] process > parseITRs (WT_R2_T1) [100%] 4 of 4 β
[79/c5fd62] process > makeBigwig (WT_R2_T1) [100%] 4 of 4 β
[22/a02566] process > makeDeeptoolsBigWig (WT_R2_T1) [100%] 20 of 20 β
[1b/218fa3] process > toFRBigWig (WT_R2_T1) [100%] 20 of 20 β
[24/35b6ac] process > shufBEDs (WT_R1) [100%] 4 of 4 β
[78/1c307b] process > callPeaks (MUT_R1) [100%] 9 of 9 β
[18/b8edd6] process > samStats (WT_R2_T1) [100%] 20 of 20 β
[e4/4461d1] process > makeSSreport (WT_R2_T1) [100%] 4 of 4 β
[4b/e769b4] process > makeFingerPrint (SSDS_pipel... [100%] 1 of 1 β
[58/02be5a] process > ssds_multiqc (WT_R2_T1) [100%] 4 of 4 β
[5b/be1b53] process > createPseudoReplicates (MUT) [ 50%] 1 of 2
[37/0acb1c] process > callPeaksForIDR (WT) [100%] 1 of 1 β
[- ] process > IDRanalysis -
[- ] process > IDRpostprocess -
[- ] process > normalizePeaks_idr -
[- ] process > makeSatCurve -
[- ] process > general_multiqc
The main output folder is specified using --outdir
parameter.
This folder will contain the following directories :
Tree overview of the output folder composition [DEPRECATED] :
.
βββ bigwig : contains bigwig files according to the parameter set
βΒ Β βββ T1
βΒ Β βββ log
βββ qc : contains quality control files, pictures and reports
βΒ Β βββ multiqc : contains a summary of QC stats for all processes
βΒ Β βΒ Β βββ *.multiQC.quality-control.report_plots
βΒ Β βΒ Β βΒ Β βββ svg
βΒ Β βΒ Β βΒ Β βββ png
βΒ Β βΒ Β βΒ Β βββ pdf
βΒ Β βΒ Β βββ *.multiQC.quality-control.report_data
βΒ Β βββ samstats : contains mapping statistics tabs
βΒ Β βββ ssds : contains tabs and plots about SSDS parsing statistics
βΒ Β βββ flagstat : contains mapping statistics files
βΒ Β βββ fingerprint : contains fingerprint plots
βΒ Β βββ trim_fastqc : contains fastqc reports for trimmed reads
βΒ Β βββ raw_fastqc : contains fastqc reports for raw reads
βΒ Β βββ fastqscreen : contains plots for fastqscreen screening
βΒ Β βββ design : contains info about the run
βΒ Β βββ pipeline_info
βββ peaks : contains bed files for peaks
βΒ Β βββ with[out]-input
βΒ Β βββ normalized : contains normalized and recentered peaks,
βΒ Β βΒ Β βββ [no-]idr
βΒ Β βΒ Β βββ tab
βΒ Β βΒ Β βββ log
βΒ Β βββ finalpeaks : contains a copy of final peaks (generally after IDR or merge)
βΒ Β βββ saturation_curve : contains saturation curve files and plots
βΒ Β βΒ Β βββ standard
βΒ Β βΒ Β βββ peaks
βΒ Β βββ macs2 : contains raw peaks called by macs2
βΒ Β βΒ Β βββ pv*_qv*_bw*_sloc*_extsize*
βΒ Β βΒ Β βββ log
βΒ Β βΒ Β βββ xls
βΒ Β βΒ Β βββ narrowPeak
βΒ Β βΒ Β βββ bed
βΒ Β βββ bed_shuffle : contains shuffled bed files before peak calling
βΒ Β βββ trim_q*
βββ bwa : contains raw, filtered and parsed reads in bam and bed format
βΒ Β βββ filterbam
βΒ Β βΒ Β βββ flag_*
βΒ Β βΒ Β βββ parse_itr
βΒ Β βΒ Β βΒ Β βββ unclassified
βΒ Β βΒ Β βΒ Β βΒ Β βββ bed
βΒ Β βΒ Β βΒ Β βΒ Β βββ bam
βΒ Β βΒ Β βΒ Β βββ type2
βΒ Β βΒ Β βΒ Β βΒ Β βββ bed
βΒ Β βΒ Β βΒ Β βΒ Β βββ bam
βΒ Β βΒ Β βΒ Β βββ type1
βΒ Β βΒ Β βΒ Β βΒ Β βββ bed
βΒ Β βΒ Β βΒ Β βΒ Β βββ bam
βΒ Β βΒ Β βΒ Β βββ norm_factors
βΒ Β βΒ Β βΒ Β βββ log
βΒ Β βΒ Β βΒ Β βββ flagstat
βΒ Β βΒ Β βΒ Β βββ dsDNA
βΒ Β βΒ Β βΒ Β βββ bed
βΒ Β βΒ Β βββ log
βΒ Β βΒ Β βββ bed
βΒ Β βΒ Β βββ bam
βΒ Β βββ bam
βΒ Β βββ log
βββ trimming : contains trimmed fastq files
βΒ Β βββ trim_fastq
βββ idr : contains files for IDR process
βββ with[out]-input
βββ narrowPeak_macs2pv*_macs2qv*1_idr_setup-*
βββ bfilt : contains blacklist filtered bed files
βββ log
βββ macs2 : contains macs2 narrowpeaks files
βΒ Β βββ log
βββ peaks : contains bed files
βββ plot : contains IDR plots
βββ pseudo_replicates : contains bed for pseudo replicates files
βββ qc
βΒ Β βββ log
βββ unthresholded-peaks : contains unthresholded bed files
I recommend to look at qc/multiqc/*.multiQC.quality-control.report.html
file first to have a look at sequencing, mapping, parsing quality.
Then you use the bigwig files in your favorite brower or online IGV.
You can also look at the peaks in peaks/with[out]-input/finalpeaks
.
Then, you can run ssdspostprocess pipeline to go deeper in the peaks analysis.
In an ideal world the pipeline would never crash but let's face it, it will happen. Here are some clues to help you debug. Fisrt, the main log file will give you many precious clues :
Launching `main.nf` [fervent_ptolemy] - revision: e2a189c33c
indicating that the current run has been internally named fervent_ptolemy[88/c511ac] process > check_design (input.csv) [100%] 1 of 1 β
[d7/59a78c] process > makeScreenConfigFile (SSDS_... [100%] 1 of 1 β
[75/9a5a8a] process > crop (TEST_IP_R1_T1) [100%] 1 of 1 β
[a7/2b8da0] process > trimming (TEST_IP_R1_T1) [100%] 1 of 1 β
[6c/b564b0] process > bwaAlign (TEST_IP_R1_T1) [100%] 1 of 1 β
[4d/cadedc] process > filterBam (TEST_IP_R1_T1) [100%] 1 of 1 β
[bf/9b1f6a] process > parseITRs (TEST_IP_R1_T1) [100%] 1 of 1 β
[55/89204f] process > makeBigwig (TEST_IP_R1_T1) [100%] 1 of 1 β
[49/a54378] process > shufBEDs (TEST_IP_R1) [100%] 1 of 1 β
[5c/ab370e] process > callPeaks (TEST_IP_R1) [100%] 5 of 5 β
[bc/5e7a98] process > samStats (TEST_IP_R1_T1) [100%] 5 of 5 β
[da/02474b] process > makeSSreport (TEST_IP_R1_T1) [100%] 1 of 1 β
[d0/be7efa] process > makeFingerPrint (SSDS_pipel... [100%] 1 of 1 β
[c5/c29f40] process > normalizePeaks (TEST_IP_R1) [100%] 1 of 1 β
[77/ceb36f] process > makeSatCurve (SSDS_pipeline... [100%] 1 of 1 β
[72/d60ad9] process > general_multiqc (SSDS_pipel... [100%] 1 of 1 β
For example, for process trimming, the associated key is a7/2b8da0 meaning that in the nextflow work directory, there will be a folder named a7, which will contain a folder beginning with 2b8da0 : this is where the detailled log files for process trimming will be. In every log folder like this, you can ckeck the following files (remember to write ls -A
to display files whose names begin with .
) :
.command.out
: output of the process .command.err
: error returned by the process, if any .command.log
: log file of the process .command.sh
: bashscript executed for the process .command.run
: nextflow script executed for the process .command.trace
: resources used by the process .exitcode
: exit code of the process (if process succedeed : must be 0) *.log
file).nexflow.log
file created each time the pipeline is run (the latest is named .nexflow.log
, the second latest is renamed .nexflow.log2
and so on). This log file can give insights of nextflow - slurm communication during the pipeline, such as jobs ID, run time and so on. This file is located in the folder from where the pipeline is launched. conf/igh.config
file (usually, exit code is 143) -profile conda
Do not hesitate to contact me or open an issue if you can't resolve one bug.
See TODO.md file.
:santa: