smithlabcode / ribotricer

A tool for accurately detecting actively translating ORFs from Ribo-seq data
http://doi.org/djv4
GNU General Public License v3.0
28 stars 8 forks source link

Ribotricer Output #139

Closed bshim181 closed 11 months ago

bshim181 commented 11 months ago

Hello,

I am currently trying to identify ORFs from two Ribosome Sequencing Samples. It is a sort of a test dataset to test whether the sequencing protocol actually works(not a lot of read counts, about half million reads). The protocol is based on the paper "Transcriptome-wide measurement of translation by ribosome profiling" by Nicholas J McGlincy.

In the analysis workflow, I conduct adapter trimming, alignment to rRNA with Bowtie2, ouputting non-aligned reads and then aligning them to transcriptome index of STAR.

Then once I have those reads, I passed the BAM files directly to Ribotricer but obtain no metagenes or translating ORFs (with a default cutoff for human).

I am currently not sure whether it is incomplete adapter trimming issue or the use of STAR aligner (might not be optimal for short reads). Would it be possible to give me an insight on this output?

STAR Output

                             Started job on |   Jul 13 13:07:09
                         Started mapping on |   Jul 13 13:10:36
                                Finished on |   Jul 13 13:11:08
   Mapping speed, Million of reads per hour |   35.59

                      Number of input reads |   316397
                  Average input read length |   37
                                UNIQUE READS:
               Uniquely mapped reads number |   57426
                    Uniquely mapped reads % |   18.15%
                      Average mapped length |   30.13
                   Number of splices: Total |   10824
        Number of splices: Annotated (sjdb) |   10824
                   Number of splices: GT/AG |   7261
                   Number of splices: GC/AG |   3550
                   Number of splices: AT/AC |   11
           Number of splices: Non-canonical |   2
                  Mismatch rate per base, % |   0.47%
                     Deletion rate per base |   0.00%
                    Deletion average length |   1.69
                    Insertion rate per base |   0.00%
                   Insertion average length |   1.00

Bowtie2 Alignment Output to rRNA (would expect about 30% alignment to rRNA)

526556 reads; of these: 526556 (100.00%) were unpaired; of these: 316397 (60.09%) aligned 0 times 59203 (11.24%) aligned exactly 1 time 150956 (28.67%) aligned >1 times 39.91% overall alignment rate 841821 reads; of these: 841821 (100.00%) were unpaired; of these: 456254 (54.20%) aligned 0 times 112594 (13.38%) aligned exactly 1 time 272973 (32.43%) aligned >1 times 45.80% overall alignment rate

For Adapter Trimming fastx_clipper -i input -a AGATCGGAAGAGCAC (constant linker sequence) -l 20 -Q33 -c -n -v -o output cutadapt --report=minimal -u 2 -m 16 -O 8 -a PEA1=NNNNNATCGT -o output input

saketkc commented 11 months ago

Your input seems correct with the caveat that ~60k reads might be too little for a genome wide assay particularly when we are looking for per-base resolution coverage. For pilot runs, I usually recommend 1M mRNA mapped reads (unique). This would explain why ribotricer isn't able to output anything.

From your STAR output, it does seem it is having hard time mapping and could likely be reflective of improper trimming - it also reports if the remaining reads were unmappable because of mismatches (which I don't see here, so that would be a hint into what is really going wrong). I have used STAR extensively for ribo-seq data analysis - it works great for short fragments as well. By default STAR soft clips the bases towards the end (unless you set alignEndsType to EndToEnd) so even with adaptors it will try really hard to map so improer adaptor trimming might not necessarily explain what is going on here.

I haven't used fastx_clipper, but it seems you are asking it to trim a Truseq adapter which I hope is correct. I usually use trim_galore and let it figure out the adapter - it also reports how many sequences had the adapter which is a good metric to be sure the trimming went fine.

bshim181 commented 11 months ago

Have you ever had experience with Ribo-ORF pipeline and whether that might be more optimal in the case with small read counts as such?

saketkc commented 11 months ago

I haven't tried it, but you are welcome to test and see what works best for you. I will close this issue since it is not related to ribotricer's functionality, but feel free to re-open if you have any questions related to ribotricer.