uschwartz / nucMACC

Pipeline to call hyper and hypo accessible nucleosomes and nucleosomes with non-canonical structure based on differential MNase-seq data
MIT License
3 stars 1 forks source link

nucMACC

Original paper - https://doi.org/10.1101/2022.12.29.521985

Introduction

nucMACC is an automated analysis pipeline for the analysis of nucleosome positions, accessibility and stability. The pipeline contains two main workflows:

  1. MNaseQC for QC and exploratory analysis
  2. nucMACC for analysis of nucleosome positions, accessibility and stability

Given trimmed paired-end sequencing reads in fastq format, this pipeline will run:

nucMACC is meant to run on pooled replicates in fastq format, whereas MNaseQC uses single replicates. As the MNaseQC and the nucMACC workflow have several steps in common, it is recommended to run first MNaseQC and report the fragment size selected bam files using --publishBamFlt. Then setting --bamEntry option, a shorter version of the nucMACC workflow can be run using the generated bam files as input. Here in an additional step at the beginning replicates are pooled.

Get started

Requirements

Installation

You can obtain the pipeline directly from GitHub:

git clone https://github.com/uschwartz/nucMACC.git

Test pipeline

The pipeline comes with a ready-to-use test data set.

nextflow run path2nucMACC/nucMACC --test

Usage

We recommend to use first the MNaseQC workflow and specifying --publishBamFlt. Then take the output and run nucMACC with --bamEntry option.

To execute the pipeline a samplesheet is required. The content depends on the workflow to execute. See examples in the toyData folder.

Workflow:

Sample_Name,path_fwdReads,path_revReads,MNase_U
H4_rep1_6.25U_cut,/toyData/H4_rep1_6.25U/H4_rep1_6.25U_cut_1.fastq.gz,/toyData/H4_rep1_6.25U/H4_rep1_6.25U_cut_2.fastq.gz,6.25
H4_rep2_6.25U_cut,/toyData/H4_rep2_6.25U/H4_rep2_6.25U_cut_1.fastq.gz,/toyData/H4_rep2_6.25U/H4_rep2_6.25U_cut_2.fastq.gz,6.25
H4_rep1_100U_cut,/toyData/H4_rep1_100U/H4_rep1_100U_cut_1.fastq.gz,/toyData/H4_rep1_100U/H4_rep1_100U_cut_2.fastq.gz,100
H4_rep2_100U_cut,/toyData/H4_rep2_100U/H4_rep2_100U_cut_1.fastq.gz,/toyData/H4_rep2_100U/H4_rep2_100U_cut_2.fastq.gz,100

Each row represents a pair of fastq files. Here unique sample names are required.

Sample_Name,replicate,path_mono,path_sub,MNase_U
H4_6.25U,rep1,/toyData/monoNuc/H4_rep1_6.25U_cut_mono.bam,/toyData/subNuc/H4_rep1_6.25U_cut_sub.bam,6.25
H4_6.25U,rep2,/toyData/monoNuc/H4_rep2_6.25U_cut_mono.bam,/toyData/subNuc/H4_rep2_6.25U_cut_sub.bam,6.25
H4_100U,rep1,/toyData/monoNuc/H4_rep1_100U_cut_mono.bam,/toyData/subNuc/H4_rep1_100U_cut_sub.bam,100
H4_100U,rep2,/toyData/monoNuc/H4_rep2_100U_cut_mono.bam,/toyData/subNuc/H4_rep2_100U_cut_sub.bam,100

Each row represents a pair of fastq files. Rows with the same sample name are considered technical replicates and pooled automatically. Only numerical values are allowed in the last column MNase_U. Duration of MNase experiment could be used as well, if the MNase concentration was constant in the experiments, but the time of digestion differed. It is recommended to use the output of MNaseQC workflow, which can be obtained specifying --publishBamFlt. However, it is as well possible to enter the pipeline at this point with manually processed bam files.

Sample_Name,path_fwdReads,path_revReads,MNase_U
H4_rep1_6.25U_cut,/toyData/H4_rep1_6.25U/H4_rep1_6.25U_cut_1.fastq.gz,/toyData/H4_rep1_6.25U/H4_rep1_6.25U_cut_2.fastq.gz,6.25
H4_rep1_100U_cut,/toyData/H4_rep1_100U/H4_rep1_100U_cut_1.fastq.gz,/toyData/H4_rep1_100U/H4_rep1_100U_cut_2.fastq.gz,100

Each row represents a pair of fastq files. In case of several replicates per MNase titration point, the fastq files need to be pooled before starting the pipeline. Only numerical values are allowed in the last column MNase_U. Duration of MNase experiment could be used as well, if the MNase concentration was constant in the experiments, but the time of digestion differed.

Execute:

Get help

nextflow run path2nucMACC/nucMACC --help

Contact

Please log all issues/suggestions on the nucMACC GitHub page: https://github.com/uschwartz/nucMACC/issues

Uwe Schwartz: uwe.schwartz@ur.de

Cite

Sara Wernig-Zorc et al. ,nucMACC: An MNase-seq pipeline to identify structurally altered nucleosomes in the genome.Sci. Adv.10,eadm9740(2024).DOI: 10.1126/sciadv.adm9740 (https://doi.org/10.1126/sciadv.adm9740)