sophiahjli / MitoRiboSeq

Codes for processing mitochondrial ribosome profiling data
MIT License
3 stars 1 forks source link

MitoRiboSeq

This workflow is used to generate codon- and gene-based analysis from mitochondrial ribosome profiling data generated using the protocols described in Monitoring mitochondrial translation with ribosome profiling in Nature Protocol by Li et al. 2020.

Dependencies are installed using Bioconda. The workflow is written using Snakemake.

Overview

This workflow is designed to take FASTQ files from the Illumina sequencer and map each mitoribosome footprint to its occupied A-site. This mapping will allow detailed monitoring of mitoribosome translation dynamics.

Starting with FASTQ files, the workflow is divided into three main parts: QC and metagene analysis, A-site assignment and codon count table generation, and downstream codon occupancy analysis. In the first part, the workflow will run QC on the input FASTQ files, then align all of the reads to the genome (nuclear and micorhondrial genome together), and finally perform a metagene analysis. The metagene analysis helps the user to determine the offset from the 3' end of the read to the ribosomal A-site. Once the offset is determined, all of the reads will be assigned to the A-site at the nucleotide level and the counts for each codon in each gene are arranged in a table to facilitate downstream analysis. Once the codon count table is generated, two common analyses, which are presented in Figure 7 of the manuscript, It includes codon occupancy analysis and cumulative mitoribosome footprint along the transcripts, are provided to visaulize mitoribosome distribution on mitochondria-encoded genes.

Inputs

Outputs

Intermediate outputs

Workflow

  1. FASTQ summary and QC metrics - Use FastQC to determine some basic QC metrics from the raw FASTQ files
  2. Trim reads - Trim adapter sequences and low quality bases from fastq files using cutadapt.
  3. Align reads - Use BWA to align reads to the genome
  4. Metagene analysis - Use the plastid scripts metagene to provide a profile of counts relative to start and stop codons to determine the A-site and P-site offset for the experiment
  5. Read Phasing Analysis - Use the plastid script phase_by_size to estimate sub-codon phasing, stratified by read length
  6. Generate codon count table - Use plastid to generate codon count table with the offset determined in the Metagene analysis (assign each reads to the corresponding ribosomal A-site)
  7. Visualize - Visualize the ribosomal A-site coverage by generating genome browser tracks (wiggle format)
  8. Codon occupancy Analysis - Written in R to generate three figures. The first one uses hierarchical clustering and heatmap to compare samples by their codon occupancy. The second one shows the codon occupancy result for each sample individually. The third one identifies ribosome stalling site along the transcripts by plotting the cumulative ribosome counts.

Install prerequisites

  1. Install conda

    • If you have Anaconda installed, you already have it.
    • Otherwise, install the Miniconda package.
  2. Enable the Bioconda channel (requires 64-bit Linux or Mac OS, Windows is not supported)

    conda config --add channels defaults
    conda config --add channels bioconda
    conda config --add channels conda-forge

Setup environment and run workflow

  1. Clone workflow into working directory

    git clone https://github.com/sophiahjli/MitoRiboSeq.git
    cd MitoRiboSeq

    NOTE: Do not install into a path that includes spaces (' '). Spaces can cause issues with some conda packages.

  2. Input data

    FASTQ files - the FASTQ data from the sequencer should be stored in data/fastq in fastq.gz format, one file per sample.

  3. Edit configuration files as needed

    cp code/mito_config.defaults.yml code/mito_config.yml
    nano code/mito_config.yml
    
    # Only if running on a cluster
    cp cluster_config.yml mycluster_config.yml
    nano mycluster_config.yml
  4. Install dependencies into an isolated environment

    conda env create --file code/mitoriboseq_environment.yml

    Note: If you are updating the workflow, you may need to update the conda environment

    conda env update --file code/mitoriboseq_environment.yml
  5. Activate the environment

    source activate MitoRiboSeq
  6. Execute the trimming, mapping, read phasing, and metagene workflow

    snakemake \
         -s code/mito_readphasing_metagene.snakefile \
         --configfile "code/mito_config.yml" \
         --use-conda \
         --cores 4

    If --configfile is not specified, the defaults are used.

  7. Execute the codon occupancy workflow

    snakemake \
         -s code/mito_codontable.snakefile \
         --configfile "code/mito_config.yml" \
         --use-conda \
         --cores 4

Common snakemake options

See the Snakemake documentation for a list of all options.

Snakemake Examples

Running workflow on a single computer with 4 threads

snakemake --configfile "myconfig.yml" --use-conda --cores 4

Running workflow using SLURM

snakemake \
    --configfile "myconfig.yml" \
    --cluster-config "mycluster_config.yml" \
    --cluster "sbatch --cpus-per-task={cluster.n} --mem={cluster.memory} --time={cluster.time}" \
    --use-conda \
    --cores 100

Running workflow on Princeton LSI cluster using DRMAA

snakemake \
    --configfile "myconfig.yml" \
    --cluster-config "cetus_cluster.yml" \
    --drmaa " --cpus-per-task={cluster.n} --mem={cluster.memory} --qos={cluster.qos} --time={cluster.time}" \
    --use-conda \
    --cores 1000 \
    --output-wait 60