riboviz / example-datasets

Example datasets to run with RiboViz
Apache License 2.0
2 stars 7 forks source link

Add Lareau et al 2014 Saccharomyces cerevisiae - troubleshooting #58

Closed swinterbourne closed 3 years ago

swinterbourne commented 3 years ago

Continuation of issue ticket #55

Overarching goal: Add datasets from Lareau et al 2014 with the aim of replicating the results of the 17 inhibitory codon pairs defined by Gamble et al 2016 in the paper Adjacent Codons Act in Concert to Modulate Translation Efficiency in Yeast (link)

Goal for this issue ticket: troubleshoot what is causing the outputs to be so irregular.

Replicate 1 GSM1406453 Replicate 2 GSM1406454 Replicate 3 GSM1406455

The adaptor sequence used was CTGTAGGCACCATCAAT which was referred to as "Universal miRNA cloning linker from New England Biolabs, Ipswich, MA (cat# S1315S)" (link) in the Lareau et al 2014 paper.

The output graphs for Replicate 1 indicate something went wrong. For example: There are aberrant peaks in the 3nt_periodicity graph. There are two peaks present in the read_length graph. The startcodon_ribogrid counts are overall close to zero.

The output graphs for Replicate 1 are listed below: 3nt_periodicity.pdf 3ntframe_propbygene.pdf codon_ribodens.pdf features.pdf pos_sp_rpf_norm_reads.pdf read_lengths.pdf startcodon_ribogrid.pdf startcodon_ribogridbar.pdf

The output graphs for Replicate 2 also indicate something went wrong. For example: 3nt_periodicity graph lacks clear periodicity. The pos_sp_rpf_norm_reads graph are aberrant. Again, there are two peaks present in the read_length graph. The startcodon_ribogrid counts are overall close to zero.

The output graphs for Replicate 2 are listed below: 3nt_periodicity.pdf 3ntframe_propbygene.pdf codon_ribodens.pdf features.pdf pos_sp_rpf_norm_reads.pdf read_lengths.pdf startcodon_ribogrid.pdf startcodon_ribogridbar.pdf

The output graphs for Replicate 3 are also weird. For example: There are aberrant peaks in the 3nt_periodicity graph. The pos_sp_rpf_norm_reads graph is also very irregular. A peak is present at 20 nt in the read_length graph. The startcodon_ribogridbar is very irregular and seems to be lacking a lot of reads.

The output graphs for Replicate 3 are listed below: 3nt_periodicity.pdf 3ntframe_propbygene.pdf codon_ribodens.pdf features.pdf pos_sp_rpf_norm_reads.pdf read_lengths.pdf startcodon_ribogrid.pdf startcodon_ribogridbar.pdf

The majority of the alignments were lost, according to the read_counts.tsv file: Replicate 1 went from 34 687 486 to 7 022 487 reads Replicate 2 went from 35 838 179 to 5 900 551 reads Replicate 3 went from 54 362 276 to 2 789 857 reads

lianafaye commented 3 years ago

Hi @swinterbourne @ewallace - happy to fill in info on the Lareau 2014 data. Those footprints come from this paper: https://elifesciences.org/articles/01257

in which we showed that ribosomes protect two footprint sizes, ~20 nt and ~28 nt. This size range shifts depending on which translation inhibitor drugs are used. When people use cycloheximide to block translation, they get only 28 nt footprints, and because the very first ribosome profiling experiments used that drug, that's what the field standardized on.

So, seeing two footprint-size peaks is exactly expected.

For a few reasons, the data in our paper don't match up with tAI etc as cleanly as some experiments -- because we didn't use any inhibitors to freeze ribosomes in place, and because of how the counts are divided across two footprint sizes that reflect different stages of the physical process of translation elongation.

I've glanced through the files and they look ok. For instance, if you remove the really big peak at the start codon, the rest of the data have pretty clear 3 nt periodicity, it's just compressed on the plot by that one big peak. But let me know if there are remaining oddities - it's totally possible our data might break some aspects of the pipeline.

ewallace commented 3 years ago

Extra note: 3' and 5' are in the wrong order

factor(., levels = c("5'", "3'"))

ewallace commented 3 years ago

This looks so good! Both the dataset and the .html