jajclement / hotSSDS

2 stars 1 forks source link

🐭 hotSSDS nextflow pipeline version 2.0 :fish:🌈

Parse, Align and Call Hotspots from single stranded DNA reads originated from SSDS (Single Stranded Dna Sequencing)

Context

SSDS method was originally published by Khil et al., 2012.
The objective is to map double-strand breaks (DSBs) along the genome.
In this method, chromatin is extracted from adult testes and then immunoprecipitated with an antibody against DMC1 protein, which is a meiosis-specific recombinase. DMC1 covers the single-stranded DNA resulting from the resection of double-strand breaks (DSBs). SSDS uses the ability of single-stranded DNA to form hairpins. The pipeline processes these specific data and identifies recombination hotspots.

See this review by FrΓ©dΓ©ric Baudat, Yukiko Imai and Bernard de Massy to learn more about meiotic recombination.

Pipeline overview

This pipeline is based on SSDS pipeline and SSDS call peaks pipeline by Kevin Brick, PhD (NIH), updated and adapted to IGH cluster.
See initial paper and technical paper.
The pipeline uses Nextflow > 20.04.1
Briefly, the update from SSDS pipeline version 1.8_NF included conda/mamba/singularity/docker execution profiles, input modification, callpeaks, post-processing and IDR procedure addition, and global nextflow homogeneisation.

The main steps of the pipeline include :

In details, the pipeline is composed of 26 processes :

Requirements

The hotSSDS nextflow pipeline requires Nextflow DSL1 version 20.10.0, and at least one of the followingΒ :

The preferable execution profile is Singularity to ensure full portability.

Nextflow can be easily installed using conda package manager.

Downloading the pipeline will preferably require git, otherwise the pipeline can be downloaded using command-line programs for retrieving files from Internet such as wget. Nextflow can easily be installed using conda package manager [https://anaconda.org/bioconda/nextflow]. The hotSSDS pipeline will rely on the pre-existence of genome references on your system. If you need to download them, then bwa and samtools will be required as well.

Download the pipeline

Using git (recommended)

git clone https://github.com/jajclement/hotSSDS.git
cd hotSSDS

Or download zip file using wget, then unzip file

wget https://github.com/jajclement/hotSSDS/archive/refs/heads/master.zip
unzip hotSSDS-master.zip
mv hotSSDS-master hotSSDS
cd hotSSDS

Download Singularity images

Singularity images are used to encapsulate all required softwares and dependencies for the different steps of the pipeline. They make the pipeline portable on different systems. As they can be voluminous, they are not included in the pipeline git repository.

Prior to run the pipeline using Singularity execution profile, it is necessary to download the images from Zenodo β€˜hotSSDS Pipeline Singularity Images’ open repository.

To do this, two options are available depending on whether the computing environment on which the pipeline will be executed has access to the Internet (see point a) or not (see point b).

a. Run the pipeline using the option --get_sif

The option --get_sif allows to launch a Β« dry run Β» that will check the existence of Singularity images in the pipeline directory. If not present, the pipeline will download them. Once download is completed, the pipeline stops. It can then be run without the option --get_sif to perfom hotSSDS analyses.

nextflow run main.nf -c conf/cluster.config \
    -params-file conf/test.json \
    –profile test,<singularity|mamba|conda|docker> \
    --get_sif >& get_sif_main_log.txt 2>&1

b. Download all singularity images independantly

Download all the 10 .sif files from zenodo open repository at https://zenodo.org/record/7783473 and place them in hotSSDS/containers folder so that the final repository structure is :

containers/
β”œβ”€β”€ bam-box-1.0
β”‚   β”œβ”€β”€ bam-box_1.0.sif
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── environment.yml
β”œβ”€β”€ bigwig-box-1.0
β”‚   β”œβ”€β”€ bigwig-box-1.0.sif
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── environment.yml
β”œβ”€β”€ frip-box-1.0
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ environment.yml
β”‚   └── frip-box_1.0.sif
β”œβ”€β”€ idr-box-1.0
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ environment.yml
β”‚   └── idr-box_1.0.sif
β”œβ”€β”€ multiqc-box-1.0
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ environment.yml
β”‚   └── multiqc-box_1.0.sif
β”œβ”€β”€ peak-calling-box-1.0
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ environment.yml
β”‚   └── peak-calling-box_1.0.sif
β”œβ”€β”€ plot-box-1.0
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ environment.yml
β”‚   └── plot-box_1.0.sif
β”œβ”€β”€ python-3.8
β”‚   β”œβ”€β”€ environment.yml
β”‚   └── python-3.8.sif
β”œβ”€β”€ ssds-qc-box-1.0
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ environment.yml
β”‚   └── ssds-qc-box_1.0.sif
└── trimming-box-1.0
    β”œβ”€β”€ Dockerfile
    β”œβ”€β”€ environment.yml
    └── trimming-box_1.0.sif

Configure computing parameter files

Edit hotSSDS/conf/cluster.config file to adjust the parameters to a computing cluster.
The following sections are expected to be overwritten :

If needed, edit hotSSDS/conf/resources.config to adjust specific process resources. To do so, edit cpus/memory/time in PROCESSES SPECIFIC RESSOURCES REQUIREMENTS section. You can also add specific computing queues to some categories.

It is important to note that many institutes have one such configuration file referenced in nf-core/configs repository that you can download and adapt to hotSSDS pipeline.

Prepare input file

The pipeline will only process paired-end data in fastq(.gz) format. The input data must be described in an input csv file with the 6 following fields :

group,replicate,fastq_1,fastq_2,antibody,control
DMC1-chip-WT,1,/path/to/data/SRR1035576_R1.fastq.gz,/path/to/data/SRR1035576_R2.fastq.gz,antiDMC1,Input-WT
DMC1-chip-WT,2,/path/to/data/SRR1035577_R1.fastq.gz,/path/to/data/SRR1035577_R2.fastq.gz,antiDMC1,Input-WT
DMC1-chip-KO,1,/path/to/data/SRR1035578_R1.fastq.gz,/path/to/data/SRR1035578_R2.fastq.gz,antiDMC1,Input-KO
DMC1-chip-KO,2,/path/to/data/SRR1035579_R1.fastq.gz,/path/to/data/SRR1035579_R2.fastq.gz,antiDMC1,Input-KO
Input-WT,1,/path/to/data/SRR1035580_R1.fastq.gz,/path/to/data/SRR1035580_R2.fastq.gz,,
Input-KO,1,/path/to/data/SRR1035581_R1.fastq.gz,/path/to/data/SRR1035581_R2.fastq.gz,,

Replicate samples must have the same "group" ID ; the same "antibody" and the same control group.
If the samples do not have an associated input control sample, leave the last fields empty.
Control (input) samples must have the 2 last fields ("antibody" and "control") empty.
There must be no empty line at the end of the csv file

Edit/create parameter file

The complete list of parameters is accessible through the command :

nextflow run main.nf --help
N E X T F L O W  ~  version 20.10.0
Launching `main.nf` [reverent_perlman] - revision: 5244f2a6f5
=============================================================================
  hotSSDS pipeline version 2.0 : Align, parse and call hotspots from SSDNA
=============================================================================
    Usage:

    nextflow run main.nf -c conf/cluster.config --params_file conf/mm10.json --inputcsv tests/fastq/input.csv  --name "runtest" --trim_cropR1 36 --trim_cropR2 40 --with_trimgalore -profile singularity -resume

    Runs with Nextflow DSL1 v20.10.0
=============================================================================
Input data parameters:
    --inputcsv                  FILE    PATH TO INPUT CSV FILE (template and default : hotSSDS/tests/fastq/input.csv)
    -params_file                FILE    PATH TO PARAMETERS JSON FILE (template and default : hotSSDS/conf/mm10.json)
    --genomebase                DIR     PATH TO REFERENCE GENOMES
    --genome                    STRING  REFERENCE GENOME NAME (must correspond to an existing genome in your config file, default : "mm10")
    --genomedir                 DIR     PATH TO GENOME DIRECTORY (required if your reference genome is not present in your config file)
    --genome_name               STRING  REFERENCE GENOME NAME (e.g ".mm10", required if your reference genome is not present in your config file)
    --genome_fasta              FILE    PATH TO GENOME FASTA FILE WITH PREEXISTING INDEX FILES FOR BWA (required if your reference genome is not present in your config file)
    --fai                       FILE    PATH TO GENOME FAI INDEX FILE (required if your reference genome is not present in your config file)
    --genome2screen             STRING  GENOMES TO SCREEN FOR FASTQC SCREENING (default : ['mm10','hg19','dm3','dm6','hg38','sacCer2','sacCer3'], comma separated list of genomes to screen reads for contamination, names must correspond to existing genomes in your config file)
    --chrsize                   FILE    Chromosome sizes file, default : ssdsnextflowpipeline/data/mm10/mm10.chrom.sizes (downloaded from https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.chrom.sizes 2021-01-11)
    --hotspots                  DIR     PATH TO HOTSPOTS FILES DIRECTORY (set to "None" if none provided ; default :  hotSSDS/data/hotspots/mm10/hotspots)
    --blacklist                 FILE    PATH TO BLACKLIST BED FILE FOR PEAK CALLING AND IDR (set to "None" if none provided ; default : hotSSDS/data/blacklist/mm10/blackList.bed)

Output and temporary directory parameters:
    --name                      STRING  ANALYSIS NAME (default : "hotSSDSPipeline")
    --outdir                    DIR     PATH TO OUTPUT DIRECTORY (default : hotSSDS/{params.name}.outdir/02_results")
    --publishdir_mode           STRING  MODE FOR EXPORTING PROCESS OUTPUT FILES TO OUTPUT DIRECTORY (default : "copy", must be "symlink", "rellink", "link", "copy", "copyNoFollow","move", see https://www.nextflow.io/docs/latest/process.html)

Pipeline dependencies:
    --src                       DIR     PATH TO SOURCE DIRECTORY (default : hotSSDS/bin ; contains perl scripts)
    --custom_bwa                EXE     PATH TO CUSTOM BWA EXEC (default : hotSSDS/bin/bwa_0.7.12)
    --custom_bwa_ra             EXE     PATH TO CUSTOM BWA_SRA EXEC (default : hotSSDS/bin/bwa_ra_0.7.12)

Trimming parameters:
    --with_trimgalore           BOOL    Use trim-galore instead of Trimmomatic for quality trimming process (default : false)
    --trimgalore_adapters       FILE    trim-galore : PATH TO ADAPTERS FILE (default : none)
    --trimg_quality             INT     trim-galore : minimum quality (default 10)
    --trimg_stringency          INT     trim-galore : trimming stringency (default 6)
    --trim_minlen               INT     trimmomatic : minimum length of reads after trimming (default 25)
    --trim_cropR1               INT     fastx : Cut the R1 read to that specified length (default 50)
    --trim_cropR2               INT     fastx : Cut the R2 read to that specified length (default 50)
    --trim_slidingwin           STRING  trimmomatic : perform a sliding window trimming, cutting once the average quality within the window falls below a threshold (default "4:15")
    --trim_illumina_clip        STRING  trimmomatic : Cut adapter and other illumina-specific sequences from the read (default "2:20:10")
    --trimmomatic_adapters      FILE    PATH TO ADAPTERS FILE FOR TRIMMOMATIC (default hotSSDS/data/TruSeq2-PE.fa, special formatting see http://www.usadellab.org/cms/?page=trimmomatic)

Mapping parameters:
    --with_multimap             BOOL    Keep multimapping reads from bam (default : false)
    --bamPGline                 STRING  bam header (default '@PG\tID:ssDNAPipeline2.0_PAUFFRET')
    --filtering_flag            INT     SAM flag for filtering bam files (default : 2052 ; see https://broadinstitute.github.io/picard/explain-flags.html)
    --picard_min_distance       INT     Picard parameter for marking duplicates (--MINIMUM_DISTANCE) :  width of the window to search for duplicates of a given alignment, default : -1 (twice the first read's read length)
    --picard_optdup_distance    INT     Picard parameter for marking duplicates (--OPTICAL_DUPLICATE_PIXEL_DISTANCE) : The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform (HiSeq2500). For the patterned flowcell models (Novaseq 6000), 2500 is more appropriate, default : 100
    --get_supp                  BOOL    Publish bam files for supplementary aligments, default : false

Bigwig parameter:
    --bigwig_profile            STRING  Bigwig profile using  bedtools (normalization by total library size) : "T1" will produce bigwig for T1 bed files only, one per replicates ; "T12" will also produce bigwig for merged T1+T2, one per replicates ; "T1rep" will also produce T1 bigwig for merged replicates ; "T12rep" will also produce T1+T2 bigwig for merged replicates (default : "T1")
    --kbrick_bigwig             BOOL    Compute bigwig files ; FR bigwig files and coverage plots as in original pipeline by Kevin Brick using deeptools with FPKM normalization (default : false)
    --binsize                   INT     Deeptools binsize parameter (used only if kbrick_bigwig is TRUE ; default : 50)

Peak calling parameters:
    --with_control              BOOL    Use input control files for peak calling analysis (default : false)
    --satcurve                  BOOL    Plot saturation curve (default : false)
    --sctype                    STRING  Saturation curve type (either 'minimal', 'standard' or 'expanded' ; default : 'standard')
    --reps                      INT     Number of iterations for saturation curve (default : 3)
    --bed_trimqual              INT     Mapping quality threshold for bed filtering (default : 30)
    --macs_bw                   INT     Macs2 callpeak bandwidth parameter (default : 1000)
    --macs_slocal               INT     Macs2 callpeak slocal parameter (default : 5000)
    --macs_extsize              INT     Macs2 callpeak extsize parameter (default : 800)
    --macs_qv                   FLOAT   Macs2 callpeak q-value parameter (default : 0.1)
    --macs_pv                   FLOAT   Macs2 callpeak p-value parameter, if not -1, will overrule macs_qv, see macs2 doc (default : -1)
    --no_chrY                   BOOL    Filter out chromosomeY peaks from final peak bed files (default : true)

Optional IDR analysis parameters (ENCODE procedure, see https://github.com/ENCODE-DCC/chip-seq-pipeline2) :
    --with_idr                  BOOL    Perform IDR analysis, only possible if nb_replicates=2 (default : false)
    --nb_replicates             INT     Number of replicates per sample (default : 2)
    --idr_peaktype              STRING  The peak file format for IDR (narrowPeak, regionPeak or broadPeak, default : "regionPeak")
    --idr_setup                 STRING  Threshold profile for idr. This will define the thresholds for true replicates, pool replicates, self replicates r1 and self replicates r2. Profile "auto" is based on ENCODE guidelines and profile "custom" allows to set custom thresholds (see parameters --idr_threshold_r1 --idr_threshold_r2 --idr_threshold_truerep and --idr_threshold_poolrep ; default : auto)
    --idr_threshold_r1          FLOAT   idr threshold for self replicates r1 (used if --idr_setup is "custom" only ; default : 0.05)
    --idr_threshold_r2          FLOAT   idr threshold for self replicates r2 (used if --idr_setup is "custom" only ; default : 0.05)
    --idr_threshold_truerep     FLOAT   idr threshold for true replicates (used if --idr_setup is "custom" only ; default : 0.05)
    --idr_threshold_poolrep     FLOAT   idr threshold for pooled replicates (used if --idr_setup is "custom" only ; default : 0.01)
    --idr_rank                  INT     p.value or q.value (default : p.value)
    --idr_filtering_pattern     STRING  Regex for filtering bed files (default :"chr[1-9X]+" for mouse ; set ".*" to keep everything)
    --idr_macs_qv               FLOAT   Macs2 callpeak q-value parameter (default : -1)
    --idr_macs_pv               FLOAT   Macs2 callpeak p-value parameter, if not -1, will overrule macs_qv, see macs2 doc (default : 0.1)

QC parameters:
    --with_ssds_multiqc         BOOL    RUN SSDS MULTIQC (default : true)
    --multiqc_configfile        FILE    OPTIONAL : PATH TO MULTIQC CUSTOM CONFIG FILE (default : hotSSDS/conf/multiqc_config.yaml)

Nextflow Tower parameter:
    -with-tower                 BOOL    Enable job monitoring with Nextflow tower (https://tower.nf/)

Singularity images parameters:
    --get_sif                   BOOL    [REQUIRE INTERNET ACCESS] Check and download singularity images if necessary (if true, pipeline will stops after download. Once downloading has been done, relaunch pipeline with false ; default: false)
    --url_sif                   URL     URL TO PUBLIC SINGULARITY IMAGES REPOSITORY (default : https://zenodo.org/record/7783473/files)

Pipeline parameters can be set in two different ways :

For Mus musculus based analysis, parameter file hotSSDS/conf/mm10.json contains default parameters that can be overwritten

One important thing to note, in Nextflow command lines, the native options are preceded with one single dash (e.g. -profile), while parameters specific to SSDS pipeline are preceded with 2 dashes (e.g. --genome 'mm10').

Launch the pipeline

Once you have set computing config file and arameter file, you can launch the pipeline using the following command-line :

nextflow run main.nf -c conf/cluster.config \
    -params-file conf/mm10.json \
    --inputcsv /path/to/input.csv \
    -profile <singularity|mamba|conda|docker> \
    --name "My_workflow_name" >& main_log.txt 2>&1

It is highly recommended to launch the command in a batch job on the computing cluster, as its execution will take time and computing resources. It is also recommanded to redirect the output of this main nextflow command-line to an identified log file, which will be usefull to monitor the pipeline execution.

The main parameters that need to be set are :

Run a short test

A small dataset can be used to test if the pipeline is correctly running on your system.
To do so, run :

nextflow run main.nf -c conf/cluster.config \
    -params-file conf/test.json \
    –profile test,<singularity|mamba|conda|docker> >& test_main_log.txt 2>&1

This test run should approximately take 5 minutes to complete.

On completion, the end of main log test_main_log.txt should look like :

executor >  pbspro (25)
[aa/ff63f4] process > check_design (input.csv)          [100%] 1 of 1 βœ”
[1f/c81a5f] process > makeScreenConfigFile (TEST_SSDS)      [100%] 1 of 1 βœ”
[f1/341fad] process > crop (TEST_IP_R1_T1)          [100%] 1 of 1 βœ”
[fe/67d328] process > trimming (TEST_IP_R1_T1)          [100%] 1 of 1 βœ”
[cb/a77283] process > bwaAlign (TEST_IP_R1_T1)          [100%] 1 of 1 βœ”
[38/e2e8c6] process > filterBam (TEST_IP_R1_T1)         [100%] 1 of 1 βœ”
[af/05397c] process > parseITRs (TEST_IP_R1_T1)         [100%] 1 of 1 βœ”
[d1/b74fbc] process > makeBigwig (TEST_IP_R1_T1)        [100%] 1 of 1 βœ”
[1f/256f21] process > shufBEDs (TEST_IP_R1)         [100%] 1 of 1 βœ”
[3c/f2b936] process > callPeaks (TEST_IP_R1)            [100%] 5 of 5 βœ”
[43/655990] process > samStats (TEST_IP_R1_T1)          [100%] 5 of 5 βœ”
[18/200309] process > makeSSreport (TEST_IP_R1_T1)      [100%] 1 of 1 βœ”
[8f/19ef51] process > makeFingerPrint (TEST_SSDS)       [100%] 1 of 1 βœ”
[c8/a77ede] process > ssds_multiqc (TEST_IP_R1_T1)      [100%] 1 of 1 βœ”
[85/6daa72] process > normalizePeaks (TEST_IP_R1)       [100%] 1 of 1 βœ”
[6a/122837] process > makeSatCurve (TEST_SSDS)          [100%] 1 of 1 βœ”
[a7/a00c9c] process > general_multiqc (TEST_SSDS)       [100%] 1 of 1 βœ”
Completed at: 10-Mar-2023 13:37:48
Duration    : 3m 57s
CPU hours   : 0.3
Succeeded   : 25

Monitor the pipeline

Tree overview of the output folder composition [DEPRECATED] :

.
β”œβ”€β”€ bigwig                                              : contains bigwig files according to the parameter set
β”‚Β Β  └── T1
β”‚Β Β      └── log
β”œβ”€β”€ qc                                                  : contains quality control files, pictures and reports
β”‚Β Β  β”œβ”€β”€ multiqc                                         : contains a summary of QC stats for all processes
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ *.multiQC.quality-control.report_plots
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ svg
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ png
β”‚Β Β  β”‚Β Β  β”‚Β Β  └── pdf
β”‚Β Β  β”‚Β Β  └── *.multiQC.quality-control.report_data
β”‚Β Β  β”œβ”€β”€ samstats                                        : contains mapping statistics tabs
β”‚Β Β  β”œβ”€β”€ ssds                                            : contains tabs and plots about SSDS parsing statistics
β”‚Β Β  β”œβ”€β”€ flagstat                                        : contains mapping statistics files
β”‚Β Β  β”œβ”€β”€ fingerprint                                     : contains fingerprint plots
β”‚Β Β  β”œβ”€β”€ trim_fastqc                                     : contains fastqc reports for trimmed reads
β”‚Β Β  β”œβ”€β”€ raw_fastqc                                      : contains fastqc reports for raw reads
β”‚Β Β  β”œβ”€β”€ fastqscreen                                     : contains plots for fastqscreen screening
β”‚Β Β  └── design                                          : contains info about the run
β”‚Β Β      └── pipeline_info
β”œβ”€β”€ peaks                                               : contains bed files for peaks
β”‚Β Β  └── with[out]-input
β”‚Β Β      β”œβ”€β”€ normalized                                  : contains normalized and recentered peaks, 
β”‚Β Β      β”‚Β Β  └── [no-]idr
β”‚Β Β      β”‚Β Β      β”œβ”€β”€ tab
β”‚Β Β      β”‚Β Β      └── log
β”‚Β Β      β”œβ”€β”€ finalpeaks                                  : contains a copy of final peaks (generally after IDR or merge)
β”‚Β Β      β”œβ”€β”€ saturation_curve                            : contains saturation curve files and plots
β”‚Β Β      β”‚Β Β  └── standard
β”‚Β Β      β”‚Β Β      └── peaks
β”‚Β Β      β”œβ”€β”€ macs2                                       : contains raw peaks called by macs2
β”‚Β Β      β”‚Β Β  └── pv*_qv*_bw*_sloc*_extsize*
β”‚Β Β      β”‚Β Β      β”œβ”€β”€ log
β”‚Β Β      β”‚Β Β      β”œβ”€β”€ xls
β”‚Β Β      β”‚Β Β      β”œβ”€β”€ narrowPeak
β”‚Β Β      β”‚Β Β      └── bed
β”‚Β Β      └── bed_shuffle                                 : contains shuffled bed files before peak calling
β”‚Β Β          └── trim_q*
β”œβ”€β”€ bwa                                                 : contains raw, filtered and parsed reads in bam and bed format
β”‚Β Β  β”œβ”€β”€ filterbam
β”‚Β Β  β”‚Β Β  └── flag_*
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ parse_itr
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”œβ”€β”€ unclassified
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”‚Β Β  β”œβ”€β”€ bed
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”‚Β Β  └── bam
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”œβ”€β”€ type2
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”‚Β Β  β”œβ”€β”€ bed
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”‚Β Β  └── bam
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”œβ”€β”€ type1
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”‚Β Β  β”œβ”€β”€ bed
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”‚Β Β  └── bam
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”œβ”€β”€ norm_factors
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”œβ”€β”€ log
β”‚Β Β  β”‚Β Β      β”‚Β Β  β”œβ”€β”€ flagstat
β”‚Β Β  β”‚Β Β      β”‚Β Β  └── dsDNA
β”‚Β Β  β”‚Β Β      β”‚Β Β      └── bed
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ log
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ bed
β”‚Β Β  β”‚Β Β      └── bam
β”‚Β Β  └── bam
β”‚Β Β      └── log
β”œβ”€β”€ trimming                                            : contains trimmed fastq files
β”‚Β Β  └── trim_fastq
β”œβ”€β”€ idr                                                 : contains files for IDR process
   └── with[out]-input
       └── narrowPeak_macs2pv*_macs2qv*1_idr_setup-*
           β”œβ”€β”€ bfilt                                   : contains blacklist filtered bed files
           β”œβ”€β”€ log
           β”œβ”€β”€ macs2                                   : contains macs2 narrowpeaks files
           β”‚Β Β  └── log
           β”œβ”€β”€ peaks                                   : contains bed files
           β”œβ”€β”€ plot                                    : contains IDR plots
           β”œβ”€β”€ pseudo_replicates                       : contains bed for pseudo replicates files
           β”œβ”€β”€ qc
           β”‚Β Β  └── log
           └── unthresholded-peaks                     : contains unthresholded bed files

I recommend to look at qc/multiqc/*.multiQC.quality-control.report.html file first to have a look at sequencing, mapping, parsing quality.
Then you use the bigwig files in your favorite brower or online IGV.
You can also look at the peaks in peaks/with[out]-input/finalpeaks .
Then, you can run ssdspostprocess pipeline to go deeper in the peaks analysis.

Debug

In an ideal world the pipeline would never crash but let's face it, it will happen. Here are some clues to help you debug. Fisrt, the main log file will give you many precious clues :

Do not hesitate to contact me or open an issue if you can't resolve one bug.

Notes and future developpments

See TODO.md file.

:santa: