junchaoshi / sports1.1

Small non-coding RNA annotation Pipeline Optimized for rRNA- and tRNA-Derived Small RNAs
GNU General Public License v3.0
45 stars 16 forks source link

tsRNA annotation results #17

Closed daixiaozhuan closed 3 years ago

daixiaozhuan commented 3 years ago

Hi, Thanks for your wonderful work. It's very convenient to use sports1.1 to annotate smallRNAseq data. Once I got the results, I have a questions here:

  1. For tsRNA results, I saw three types: _5_end, _3_end and *_CCA_end, what's the difference between 3_end and CCA_end?

  2. Also, some tRNAs don't belongs to these three types, for example "tRNA-Val-AAC", which catergorize to none of these three; So how should I deal with this situation if I wanna analysis tsRNAs in my samples?

Looking forward for your reply! Thanks, Xiaozhuan

junchaoshi commented 3 years ago

Hi Xiaozhuan,

  1. The seqs annotated as _CCA_end indicate they contain the CCA tail of mature tRNAs while those annotated as _3_end indicate they are from the 3' end of tRNA without CCA end.
  2. Those seqs annotated as tRNA without *_end information indicate they may from the middle region of tRNAs. Hope the information helps.

My best, Junchao

daixiaozhuan commented 3 years ago

Hi Junchao,

Thanks for your quick reply that's really helpful.

I wanna make sure the 3' end without CCA end means from the precursor tRNA? How do you determine 5' end or 3'end? I mean which is the exact breakpoint or position to classify them to 5' end and 3' end or internal tsRNAs?

Best, Xiaozhuan

junchaoshi commented 3 years ago

Hi Xiaozhuan,

Hope the example below helps:

pre tRNA seq: GGGGGTATAGCTCAGGGGTAGAGCATTTGACTGCAGATCAAGAGGTCCCTGGTTCAAATCCGGGTGCCCCCT

mature tRNA seq: GGGGGTATAGCTCAGGGGTAGAGCATTTGACTGCAGATCAAGAGGTCCCTGGTTCAAATCCGGGTGCCCCCT CCA

5 end tsRNA: GGGGGTATAGCTCAGGGGTAGAGC

3 end tsRNA: TTCAAATCCGGGTGCCCCCT

CCA end tsRNA: TTCAAATCCGGGTGCCCCCT CCA

other tsRNA: GGGGTATAGCTCAGGGGTAGAGCA, TGACTGCAGATCAAGAGGT, TCAAATCCGGGTGCCC

All the best, Junchao

daixiaozhuan commented 3 years ago

Hi Junchao,

This example is super awesome!

But why some tRNAs don't end with CCA? Are they precursors? Because I think all the mature tRNAs have the CCA end, right?

one more question: if one sequence mapped to multiple tRNAs, how do you do with them in the further analysis?

best, Xiaozhuan

junchaoshi commented 3 years ago

Hi Xiaozhuan,

Based on the theory, the precursor tRNAs may not have an extra CCA end, so we add an extra CCA end to pre-mature tRNA seqs to generate the mature tRNA seqs.

As stated in the paper, " read number of sequences from multiple matching loci are uniformly distributed (based on the assumption that each of these multiple sites equally expresses RNAs)". If a seq with 3 reads can match two types of tRNAs, it will contribute 1.5 reads to each type of tRNAs.

My best, Junchao

daixiaozhuan commented 3 years ago

Hi Junchao,

Based on what you said, all precursor tRNAs added extra CCA end. Where can the 3' end without CCA come from? I'm a little confused about this.

For the second question, for example as bellow: t00000001 GCATTGGTGGTTCAGTGGTAGAATTCTCGC 30 266781 Yes mature-tRNA-Gly-GCC_5_end;mature-tRNA-Gly-CCC_5_end

So, mature-tRNA-Gly-GCC_5_end and mature-tRNA-Gly-CCC_5_end both have half of the total reads (266781/2)?

Thanks, Xiaozhuan

junchaoshi commented 3 years ago

Hi Xiaozhuan,

The tsRNA is not tRNA. It could be considered that 3'end tsRNAs are mapping to the mature-tRNAs that without CCA end. One possible explanation is that the CCA end somehow dropped after the generation of such kinds of tsRNAs.

No, "read number of sequences from multiple matching loci are uniformly distributed". In this case, GCATTGGTGGTTCAGTGGTAGAATTCTCGC could map to 10 loci of tRNA-Gly-GCC and 2 loci of tRNA-Gly-CCC with 1 mismatch tolerance, so that the read number needs to be separated to 10/12 266781 and 2/12 266781.

Junchao

SergioRodLla commented 2 months ago

Hi Junchao,

Just to clarify one thing. In a previous post in this thead you mention that those seqs annotated as tRNA without *_end information indicate they may from the middle region of tRNAs. I was wondering if this applies as well in the annotations in found in the output length_distribution.txt files. I'm saying this because looking at source/overall_RNA_length_distribution.R, I see that to get the number of different tsRNA you do:

dis.tRNA   <- length.dis[grep("tRNA_Match_Genome|tRNA_Unmatch_Genome", length.dis$name, ignore.case = TRUE), 2:3]
dis.tRNA.5.end    <- length.dis[grep("tRNA_5_end_Match_Genome|tRNA_5_end_Unmatch_Genome", length.dis$name, ignore.case = TRUE), 2:3]
dis.tRNA.3.end    <- length.dis[grep("tRNA_3_end_Match_Genome|tRNA_3_end_Unmatch_Genome", length.dis$name, ignore.case = TRUE), 2:3]
dis.tRNA.CCA.end  <- length.dis[grep("tRNA_CCA_end_Match_Genome|tRNA_CCA_end_Unmatch_Genome", length.dis$name, ignore.case = TRUE), 2:3]
.
.
.
dis.tRNA   <- data.frame(name = "tsRNA", length.combine(dis.tRNA, len))
dis.tRNA.5.end   <- data.frame(name = "tsRNA-5'end", length.combine(dis.tRNA.5.end, len))
dis.tRNA.3.end   <- data.frame(name = "tsRNA-3'end", length.combine(dis.tRNA.3.end, len))
dis.tRNA.CCA.end <- data.frame(name = "tsRNA-CCA end", length.combine(dis.tRNA.CCA.end, len))
dis.tRNA.other   <- data.frame(name = "tsRNA-other", length = len, reads = dis.tRNA$reads - dis.tRNA.5.end$reads - dis.tRNA.3.end$reads - dis.tRNA.CCA.end$reads)

Does this mean that in this case the tRNA without *_end information are the sum of 3' + 5' + CCA + middle region (dis.tRNA.other) ? If so, should these annotations be interpreted differently than the one in _output.txt files?

junchaoshi commented 2 months ago

Hi SergioRodLla,

In the _length_distribution.txt file, for example, the reads in the GtRNAdb-tRNA_Match_Genome category include the reads of GtRNAdb-tRNA_5_end_Match_Genome, GtRNAdb-tRNA_3_end_Match_Genome, GtRNAdb-tRNA_CCA_end_Match_Genome, and the tsRNA reads from the tRNA middle region. In the _output.txt file, tsRNAs that are annotated without a 5_end, 3_end, or CCA_end are indicated as not originating from the terminal regions of the corresponding tRNAs.

I hope this information helps.

Junchao