Original paper - https://doi.org/10.1101/2022.12.29.521985
nucMACC is an automated analysis pipeline for the analysis of nucleosome positions, accessibility and stability. The pipeline contains two main workflows:
MNaseQC
for QC and exploratory analysisnucMACC
for analysis of nucleosome positions, accessibility and stability
Given trimmed paired-end sequencing reads in fastq format, this pipeline will run:
MNaseQC
and nucMACC
FastQC
on fastq filesBowtie2
on fastq filesQualimap
on aligned fragmentsDANPOS
deepTools computeMatrix
MultiQC
MNaseQC
specific
deepTools
deepTools
nucMACC
specific
DANPOS
bedtools genomecov
featureCounts
nucMACC
is meant to run on pooled replicates in fastq format, whereas MNaseQC
uses single replicates. As the MNaseQC
and the nucMACC
workflow have several steps in common, it is recommended to run first MNaseQC
and report the fragment size selected bam files using --publishBamFlt
. Then setting --bamEntry
option, a shorter version of the nucMACC
workflow can be run using the generated bam files as input. Here in an additional step at the beginning replicates are pooled.
Docker
and nextflow
are required to run the nucMACC pipeline. Additional software used in the pipeline is packaged in Docker container and will be automatically downloaded during the first execution of the pipeline.nextflow.config
.--high_memory
or directly in the nextflow.config
.deeptools computeMatrix
. For further customisation of TSS plots we recommend direct use of the deepTools package and the bigwig files provided by the pipeline. You can obtain the pipeline directly from GitHub:
git clone https://github.com/uschwartz/nucMACC.git
The pipeline comes with a ready-to-use test data set.
nextflow run path2nucMACC/nucMACC --test
We recommend to use first the MNaseQC
workflow and specifying --publishBamFlt
. Then take the output and run nucMACC
with --bamEntry
option.
To execute the pipeline a samplesheet is required. The content depends on the workflow to execute. See examples in the toyData
folder.
Workflow:
MNaseQC
(example toyData/input_replicates.csv
)Sample_Name,path_fwdReads,path_revReads,MNase_U
H4_rep1_6.25U_cut,/toyData/H4_rep1_6.25U/H4_rep1_6.25U_cut_1.fastq.gz,/toyData/H4_rep1_6.25U/H4_rep1_6.25U_cut_2.fastq.gz,6.25
H4_rep2_6.25U_cut,/toyData/H4_rep2_6.25U/H4_rep2_6.25U_cut_1.fastq.gz,/toyData/H4_rep2_6.25U/H4_rep2_6.25U_cut_2.fastq.gz,6.25
H4_rep1_100U_cut,/toyData/H4_rep1_100U/H4_rep1_100U_cut_1.fastq.gz,/toyData/H4_rep1_100U/H4_rep1_100U_cut_2.fastq.gz,100
H4_rep2_100U_cut,/toyData/H4_rep2_100U/H4_rep2_100U_cut_1.fastq.gz,/toyData/H4_rep2_100U/H4_rep2_100U_cut_2.fastq.gz,100
Each row represents a pair of fastq files. Here unique sample names are required.
nucMACC --bamEntry
(example toyData/sub_input.csv
)Sample_Name,replicate,path_mono,path_sub,MNase_U
H4_6.25U,rep1,/toyData/monoNuc/H4_rep1_6.25U_cut_mono.bam,/toyData/subNuc/H4_rep1_6.25U_cut_sub.bam,6.25
H4_6.25U,rep2,/toyData/monoNuc/H4_rep2_6.25U_cut_mono.bam,/toyData/subNuc/H4_rep2_6.25U_cut_sub.bam,6.25
H4_100U,rep1,/toyData/monoNuc/H4_rep1_100U_cut_mono.bam,/toyData/subNuc/H4_rep1_100U_cut_sub.bam,100
H4_100U,rep2,/toyData/monoNuc/H4_rep2_100U_cut_mono.bam,/toyData/subNuc/H4_rep2_100U_cut_sub.bam,100
Each row represents a pair of fastq files. Rows with the same sample name are considered technical replicates and pooled automatically. Only numerical values are allowed in the last column MNase_U
. Duration of MNase experiment could be used as well, if the MNase concentration was constant in the experiments, but the time of digestion differed. It is recommended to use the output of MNaseQC
workflow, which can be obtained specifying --publishBamFlt
. However, it is as well possible to enter the pipeline at this point with manually processed bam files.
toyData/input.csv
)Sample_Name,path_fwdReads,path_revReads,MNase_U
H4_rep1_6.25U_cut,/toyData/H4_rep1_6.25U/H4_rep1_6.25U_cut_1.fastq.gz,/toyData/H4_rep1_6.25U/H4_rep1_6.25U_cut_2.fastq.gz,6.25
H4_rep1_100U_cut,/toyData/H4_rep1_100U/H4_rep1_100U_cut_1.fastq.gz,/toyData/H4_rep1_100U/H4_rep1_100U_cut_2.fastq.gz,100
Each row represents a pair of fastq files. In case of several replicates per MNase titration point, the fastq files need to be pooled before starting the pipeline. Only numerical values are allowed in the last column MNase_U
. Duration of MNase experiment could be used as well, if the MNase concentration was constant in the experiments, but the time of digestion differed.
Execute:
MNaseQC
nextflow run path2nucMACC/nucMACC \
--analysis 'MNaseQC' \
--csvInput 'sample_sheet.csv' \
--outDir <OUTDIR> \
--genomeIdx 'Bowtie2Index/genome' \
--genomeSize 119481543 \
--genome 'genome.fa' \
--publishBamFlt \
--blacklist 'blacklisted_regions.bed' \
--TSS 'genes.gtf'
All options, except --publishBamFlt
,--blacklist
, and --TSS
, are required.
nucMACC
with --bamEntry
nextflow run path2nucMACC/nucMACC \
--analysis 'nucMACC' \
--csvInput 'sample_sheet.csv' \
--outDir <OUTDIR> \
--genomeIdx 'Bowtie2Index/genome' \
--genomeSize 119481543 \
--genome 'genome.fa' \
--bamEntry \
--TSS 'genes.gtf'
All options, except --TSS
, are required.
nextflow run path2nucMACC/nucMACC --help
Please log all issues/suggestions on the nucMACC GitHub page: https://github.com/uschwartz/nucMACC/issues
Uwe Schwartz: uwe.schwartz@ur.de
Sara Wernig-Zorc et al. ,nucMACC: An MNase-seq pipeline to identify structurally altered nucleosomes in the genome.Sci. Adv.10,eadm9740(2024).DOI: 10.1126/sciadv.adm9740 (https://doi.org/10.1126/sciadv.adm9740)