jsalignon / cactus

Chromatin ACcessibility and Transcriptomics Unifying Software
MIT License
13 stars 3 forks source link

Overview of Cactus.
(a) Key features. Icons were adapted from Servier Medical Art and the Database Center for the life sciences/TogoTV. (b) Simplified workflow. (c) Example of enrichment analysis performed for a gene showing an increase in both chromatin accessibility and gene expression upon treatment. Enrichment of internal GRs and GSs indicates enrichment of GRs and GSs in other GRs and GSs generated by the pipeline. Black lines and blue circles represent DNA and nucleosomes, respectively. Orange lines represent mRNA molecules. (d) Sub-workflow showing the creation of DASs. Dotted arrows indicate optional additional filters. Abbreviations: DAR, differentially accessible region; DEG, differentially expressed gene; ChIP, ChIP-Seq binding sites; motifs, DNA binding motifs; FDR, false discovery rate; prom, promoter; distNC, distal non-coding region.

CACTUS (Chromatin ACcessibility and Transcriptomics Unification Software) is an mRNA-Seq and ATAC-Seq analysis pipeline that aims to assist researchers in formulating hypotheses about the molecular mechanisms regulating their conditions of interest. The pipeline does standard preprocessing and differential abundance analysis, followed by enrichment analysis using various large-scale external datasets, such as databases of gene ontologies, pathways, DNA binding motifs, CHIP-Seq binding sites, and chromatin states. Currently, Cactus can analyze data from any of the four ENCODE/modENCODE species: H. sapiens, M. musculus, D. melanogaster and C. elegans. The pipeline is designed to be easy to use for people without bioinformatics skills, efficient and reproducible through the use of the workflow language Nextflow, and various tools managers (Singularity, Docker, Conda, Mamba), and flexible with many parameters available to customize the analysis. Output files are easy to view (e.g., multiQC, merged and individual pdfs and tables, formatted Excel tables) and interpret (e.g., standardized downstream analysis figures, customizable heatmaps).

This introductory section provides a quick overview of how Cactus works, with:

Reference: Salignon, J., Millan-Ariño, L., Garcia, M. U. & Riedel, C. G. (2024). Cactus: a user-friendly and reproducible ATAC-Seq and mRNA-Seq analysis pipeline for data preprocessing, differential analysis, and enrichment analysis. Lincs: Genomics, bioRxiv..

Licence: This source code is released under the MIT license, included here.