hw538 / cfDNAPro

cfDNAPro specializes in standardized and robust cfDNA fragmentomic analysis
GNU General Public License v3.0
28 stars 2 forks source link
bioinformatics cancer-genomics cancer-research cell-free-dna early-detection genomics-visualization liquid-biopsy r swgs whole-genome-sequencing

cfDNAPro

Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Hits

Official tutorials

For detailed documentation please visit: https://cfdnapro.readthedocs.io/en/latest/

Declaration

cfDNAPro is designed for research only.

Challenges in the cfDNA fragment length calculation

Unlike genomic DNA, cfDNA has specific fragmentation patterns. Ambiguous definition of "fragment length" by various alignment software is raising concerns: see page 9 footnote in SAM file format spec: https://samtools.github.io/hts-specs/SAMv1.pdf
Cell-free DNA data fragmentomic analysis requires single-molecule level resolution, which further emphasizes the importance of accurate/un-biased feature extraction.

cfDNAPro is designed to resolve this issue and standardize the cfDNA fragmentomic analysis using the bioconductor R ecosystem.

Input

cfDNAPro is specifically written for cell-free DNA paire-ed whole-genome sequencing data. Its ensures accurate (i.e. up-to-standard) calculation of fragmentomic features (e.g., fragment lengths and motif)

Output

cfDNApro can extract (i.e., "quantify in a standandised and robust way") these features/bio-markers:

Feature extration depends on essential data objects/R packages in the Bioconductor ecosystem, such as Rsamtools, plyranges, GenomicAlignments, GenomeInfoDb and Biostrings.
Data engineering depends on packges in the tidyverse ecosystem, such as dplyr, and stringr.
All plots depend on ggplot2 R packge.

For issues/feature request etc., please contact:
Author: Haichao Wang
wanghaichao2014@gmail.com
Author: Paulius D. Mennea
paulius.mennea@cruk.cam.ac.uk
Nitzan Rosenfeld Lab admin mailbox:
Rosenfeld.LabAdmin@cruk.cam.ac.uk

Quick Start 1

Read in bam file, return the fragment length counts. A straightforward and frequent user case: calculate the fragment size of a bam file, use the following code:


# install cfDNAPro newest version 

if (!require(devtools)) install.packages("devtools")
devtools::install_github("hw538/cfDNAPro", build_vignettes = FALSE)

# calculate insert size of a bam file

library(cfDNAPro)
 frag_lengths <- read_bam_insert_metrics(bamfile = "/path/to/bamfile.bam")

The returned dataframe contains two columns, i.e., "insert_size" (fragment length) and "All_Reads.fr_count" (the count of the fragment length). A screenshot of the output:

image

Quick Start 2

Read bam file, return the fragment name (i.e. read name in bam file) and alignment coordinates in GRanges object in R. If needed, you can convert the GRanges into a dataframe and the fragment length is stored in the "width" column.


library(cfDNAPro)

# read bam file, do alignment curation
 frags <- readBam(bamfile = "/path/to/bamfile.bam")
# convert GRanges object to a dataframe in R
 frag_df <- as.data.frame(frags)

A screenshot of the output:

image

News

cfDNAAPro 1.7.1 (Aug 2024)

cfDNAPro 1.7.1 (May 2023)

Installation

Please install our latest version(highly recommended):

if (!require(devtools)) install.packages("devtools")
library(devtools)
devtools::install_github("hw538/cfDNAPro", build_vignettes = TRUE, dependencies = TRUE)
# run below instead if you don't want to build vignettes inside R
# devtools::install_github("hw538/cfDNAPro", build_vignettes = FALSE, dependencies = FALSE)

Or install the released/steady version (i.e., not newest version, some functions might be missing in comparison to functions shown in this webpage) via Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("cfDNAPro")

Vignettes/tutorials

visit: https://cfdnapro.readthedocs.io/en/latest/

Citation

Please cite package ‘cfDNAPro’ in publications:

Haichao Wang, Paulius Mennea, Elkie Chan, Hui Zhao, Christopher G. Smith, Tomer Kaplan, Florian Markowetz, Nitzan Rosenfeld(2024). cfDNAPro:An R/Bioconductor package to extract and visualise cell-free DNA biological features. R package version 1.7.1 https://github.com/hw538/cfDNAPro