Code for analysis : Darnell, Subramaniam, and O'Shea 2018

This github repository includes jupyter notebooks (and associated static HTML files) for reproducing all plots in the paper, and R scripts for sequencing data analysis, in scripts/

Annotation files
Generating bowtie indices and annotation files
Pipeline for processing ribosome profiling sequencing data

Annotation files

The following bowtie indices / annotation files should be generated in new folders called bowtie_indices and sequence_annotation_files before running the script below. These files are not provided in the repository due to their large size, but instructions for generating them are included below.

bowtie index for subtractive alignment of rRNA contaminants: hg38.rrna
transcriptome rsem reference bowtie index for alignment: genome_rsem/hg38.gencode.v24.rsem/bowtie
annotations in fai format for preparing transcriptome rsem reference bowtie index: hg38.fa.masked
canonical CCDS subset annotations for assigning reads after alignment: gencode.v24.canonical_ccds_transcripts.20170315.gff3
mouse gencode annotations for matching ATF4/CHOP targets to human homologs: gencode.vM9.annotation.gff3.gz

Generating bowtie indices and annotation files

Required software

bowtie/1.1.1
rsem/1.3.0
samtools/1.4.0
gzip/1.3.12
R/3.3.3
python/2.7.13

Download, unzip, and index hg38 genome annotations (hg38.fa.masked)

rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.masked.gz .

gunzip hg38.fa.masked.gz

samtools faidx hg38.fa.masked

Download Gencode annotations and extract unique CCDS annotations to a gff3 file (gencode.v24.canonical_ccds_transcripts.20170315.gff3)

wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.annotation.gff3.gz

gunzip gencode.v24.annotation.gff3.gz

Rscript make_unique_ccds_transcripts.R

Download mouse Gencode annotations

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M9/gencode.vM9.annotation.gff3.gz

Download rRNA sequences from NCBI, write to fasta, and create human rRNA bowtie index

from Bio import Entrez
from Bio import SeqIO

Entrez.email = 'YOUR_EMAIL_ADDRESS@XYZ.EDU'

refseqids = {'28S' : 'NR_003287.2',
             '18S' : 'NR_003286.2',
             '5.8S' : 'NR_003285.2',
             '5S' : 'NR_023363.1',
            }

outputlist = []
for (name,ID) in refseqids.items():
    # identify ids by searching
    rec = Entrez.read(Entrez.esearch(db="nucleotide", term=ID))
    # retrieve full record for the found id
    handle = Entrez.efetch(db="nucleotide", id=rec["IdList"][0], rettype="fasta")
    output = SeqIO.read(handle, 'fasta')
    # clean up record to be minimalistic
    output.id = ID
    output.description = output.name = ''
    outputlist.append(output)

SeqIO.write(outputlist, 'sequence_annotation_files/rrna.fa', 'fasta')

/app/bowtie/1.1.1/bowtie-build sequence_annotation_files/rrna.fa hg38.rrna

Prepare gencode transcriptome index (hg38.gencode.v24.rsem) with rsem for alignment

rsem-prepare-reference --bowtie-path /app/bowtie/1.1.1 --bowtie --gff3 sequence_annotation_files/gencode.v24.annotation.gff3 sequence_annotation_files/hg38.fa.masked bowtie

Pipeline for processing ribosome profiling sequencing data

Starting with .fastq files.

Required software

bowtie/1.1.1
cutadapt/1.4.1
R/3.3.3
rsem/1.3.0

Make a directory for each sample

for SAMPLE in `ls rawdata/*.fq.gz | cut -f1 -d. | cut -f2 -d/`;
do
    mkdir processeddata/$SAMPLE; 
    echo $SAMPLE; 
done

Trim reads

for SAMPLE in `ls rawdata/*.fq.gz | cut -f1 -d. | cut -f2 -d/`; 
do 
    cutadapt --adapter=AAAAAAAAAA --minimum-length=13 --discard-untrimmed --output processeddata/$SAMPLE/$SAMPLE.trim.fq rawdata/$SAMPLE.R1.fq.gz; 
done

Subtractive alignment to rRNA

for SAMPLE in `ls processeddata/*/*.trim.fq | cut -f1 -d. | cut -f2 -d/`; 
do 
    bowtie --seedlen=23 --threads=8 --un processeddata/$SAMPLE/norrna.fq --sam bowtie_indices/hg38.rrna processeddata/$SAMPLE/$SAMPLE.trim.fq; 
done

Align to transcriptome


for SAMPLE in `ls processeddata/*/*.norrna.fq | cut -f1 -d. | cut -f2 -d/`; 
do
    rsem-calculate-expression --num-threads 8 --output-genome-bam --sort-bam-by-coordinate  processeddata/$SAMPLE/$SAMPLE.norrna.fq /n/osheafs1/LAB/alicia/bioinformatics/bowtie_indices/genome_rsem/hg38.gencode.v24.rsem/bowtie processeddata/$SAMPLE/gencode; 
done

Calculate codon-specific ribosome density


for SAMPLE in `ls processeddata/*/*norrna* | cut -f1 -d. | cut -f2 -d/`; 
do 
    Rscript scripts/calculate_codon_ribo_density_150overhang_20170807.R $SAMPLE; echo $SAMPLE; 
done

Run R script to calculate gene fold changes

for SAMPLE in `ls processeddata/*/*norrna* | cut -f1 -d. | cut -f2 -d/`; 
do 
    Rscript scripts/calculate_gene_fold_change_20170615.R $SAMPLE; echo $SAMPLE; 
done

rasilab / adarnell_2018

readme

Code for analysis : Darnell, Subramaniam, and O'Shea 2018

Table of Contents

Annotation files

Generating bowtie indices and annotation files

Required software

Download, unzip, and index hg38 genome annotations (hg38.fa.masked)

Download Gencode annotations and extract unique CCDS annotations to a gff3 file (gencode.v24.canonical_ccds_transcripts.20170315.gff3)

Download mouse Gencode annotations

Download rRNA sequences from NCBI, write to fasta, and create human rRNA bowtie index

Prepare gencode transcriptome index (hg38.gencode.v24.rsem) with rsem for alignment

Pipeline for processing ribosome profiling sequencing data

Required software

Make a directory for each sample

Trim reads

Subtractive alignment to rRNA

Align to transcriptome

Calculate codon-specific ribosome density

Run R script to calculate gene fold changes