Closed bshim181 closed 11 months ago
Your input seems correct with the caveat that ~60k reads might be too little for a genome wide assay particularly when we are looking for per-base resolution coverage. For pilot runs, I usually recommend 1M mRNA mapped reads (unique). This would explain why ribotricer isn't able to output anything.
From your STAR output, it does seem it is having hard time mapping and could likely be reflective of improper trimming - it also reports if the remaining reads were unmappable because of mismatches (which I don't see here, so that would be a hint into what is really going wrong). I have used STAR extensively for ribo-seq data analysis - it works great for short fragments as well. By default STAR soft clips the bases towards the end (unless you set alignEndsType
to EndToEnd
) so even with adaptors it will try really hard to map so improer adaptor trimming might not necessarily explain what is going on here.
I haven't used fastx_clipper
, but it seems you are asking it to trim a Truseq adapter which I hope is correct. I usually use trim_galore and let it figure out the adapter - it also reports how many sequences had the adapter which is a good metric to be sure the trimming went fine.
Have you ever had experience with Ribo-ORF pipeline and whether that might be more optimal in the case with small read counts as such?
I haven't tried it, but you are welcome to test and see what works best for you. I will close this issue since it is not related to ribotricer's functionality, but feel free to re-open if you have any questions related to ribotricer.
Hello,
I am currently trying to identify ORFs from two Ribosome Sequencing Samples. It is a sort of a test dataset to test whether the sequencing protocol actually works(not a lot of read counts, about half million reads). The protocol is based on the paper "Transcriptome-wide measurement of translation by ribosome profiling" by Nicholas J McGlincy.
In the analysis workflow, I conduct adapter trimming, alignment to rRNA with Bowtie2, ouputting non-aligned reads and then aligning them to transcriptome index of STAR.
Then once I have those reads, I passed the BAM files directly to Ribotricer but obtain no metagenes or translating ORFs (with a default cutoff for human).
I am currently not sure whether it is incomplete adapter trimming issue or the use of STAR aligner (might not be optimal for short reads). Would it be possible to give me an insight on this output?
STAR Output
Bowtie2 Alignment Output to rRNA (would expect about 30% alignment to rRNA)
526556 reads; of these: 526556 (100.00%) were unpaired; of these: 316397 (60.09%) aligned 0 times 59203 (11.24%) aligned exactly 1 time 150956 (28.67%) aligned >1 times 39.91% overall alignment rate 841821 reads; of these: 841821 (100.00%) were unpaired; of these: 456254 (54.20%) aligned 0 times 112594 (13.38%) aligned exactly 1 time 272973 (32.43%) aligned >1 times 45.80% overall alignment rate
For Adapter Trimming fastx_clipper -i input -a AGATCGGAAGAGCAC (constant linker sequence) -l 20 -Q33 -c -n -v -o output cutadapt --report=minimal -u 2 -m 16 -O 8 -a PEA1=NNNNNATCGT -o output input