ycl6 / 16S-rDNA-V3-V4

16S rDNA V3-V4 amplicon sequencing analysis using dada2, phyloseq, LEfSe, picrust2 and other tools. Demo: https://ycl6.github.io/16S-Demo/
GNU General Public License v3.0
33 stars 18 forks source link
16s bioinformatics dada2 illumina lefse microbiome microbiota phyloseq picrust2

16S rDNA V3-V4 amplicon sequencing analysis

This GitHub repository includes codes and scripts that demonstrate the use of dada2 and phyloseq (and associated tools and R packages) to analyze 16S rDNA amplicon sequencing data. An working example is included in the example folder.

Disclaimer: Do not use any of the provided codes and scripts in production without fully understanding of the contents. I am not responsible for errors or omissions or for any consequences from use of the contents and make no warranty with respect to the currency, completeness, or accuracy of the contents from this GitHub repository.

Set up conda environment

I strongly recommend using conda to manage the software and R packages required for this analysis.

# Create main "16S" env
# Remove `picrust2=2.4` from the list if it causes conflicts
conda create -n 16S -c conda-forge -c bioconda r-base=4.1 python=3 curl libcurl openssl boost-cpp \
r-doparallel r-devtools r-rcurl r-httr r-magick r-png r-ggplot2 r-data.table r-phangorn r-ape r-gridextra \
r-ggbeeswarm r-ggrepel r-vegan r-tidyverse r-gtools r-r.utils bioconductor-dada2 bioconductor-phyloseq \
bioconductor-decipher bioconductor-deseq2 bioconductor-shortread bioconductor-biostrings \
bioconductor-biomformat bioconductor-aldex2 cutadapt raxml raxml-ng lefse picrust2=2.4

# Use "conda activate 16S" to activate this environment
# Create "graphlan" env
conda create -n graphlan export2graphlan graphlan

# Use "conda activate graphlan" to activate this environment
# Create "picrust2" env if it causes conflicts when setting up the "16S" env
conda create -n picrust2 picrust2=2.4

# Use "conda activate picrust2" to activate this environment

Workflow

1. PCR primer trimming using cutadapt

Usage: perl run_trimming.pl project_folder fastq_folder forward_primer_sequence reverse_primer_sequence
Usage: perl run_trimming.pl PRJEB27564 raw CCTACGGGNGGCWGCAG GACTACHVGGGTATCTAATCC

2. Run DADA2 workflow

2a.

2b.

3. Explore microbiome profiles using phyloseq

4. Identify and plot differential taxa features using LEfSe and GraPhlAn

5. Infer functional capacity using picrust2 and statistical analysis using ALDEx

Download Silva and NCBI taxonomy DB

  1. Silva taxonomic training data formatted for DADA2 (Silva version 138.1)

https://zenodo.org/record/4587955#.YNWax3VKhkY

Note: These files are intended for use in classifying prokaryotic 16S sequencing data and are not appropriate for classifying eukaryotic ASVs.

  1. Create from NCBI 16SMicrobial blast DB

Install BLAST

conda install -c bioconda blast

or

sudo apt-get install ncbi-blast+

Download NCBI's 16S rRNA BLAST DB

wget ftp://ftp.ncbi.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz
tar zxf 16S_ribosomal_RNA.tar.gz

Convert 16SMicrobial BLAST DB into FASTA format

blastdbcmd -db 16S_ribosomal_RNA -entry all -out 16SMicrobial.fa
gzip 16SMicrobial.fa

Download taxa_summary.R

http://evomics.org/phyloseq/taxa_summary-r/

Use gunzip taxa_summary.R.gz to extract file