junchaoshi / sports1.1

Small non-coding RNA annotation Pipeline Optimized for rRNA- and tRNA-Derived Small RNAs
GNU General Public License v3.0
45 stars 16 forks source link

Understanding the pre-compiled databases #19

Closed dktanwar closed 3 years ago

dktanwar commented 3 years ago

Hi,

I am trying to understand how the pre-compiled databases were obtained.

-rRNA database (Original source: https://www.ncbi.nlm.nih.gov/nuccore)

I am not sure how exactly rRNAs were obtained. Does one just have to search for the species and take all the fasta sequences?

-mitotRNAdb database [6] (Original source: http://mttrna.bioinf.uni-leipzig.de/mtDataOutput/)

It looks like there are only 22 mt_tRNAs for mouse: http://mttrna.bioinf.uni-leipzig.de/mtDataOutput/Result

Whereas, there are 45 in the pre-compiled ones.

Thank you for clarifying.

junchaoshi commented 3 years ago

Hi,

For rRNAs, One sequence for each kind of rRNAs is manually/arbitrually selected.

For mito-tRNAs, the sequences of Mus Musculus are subspecies are also included, such as Mus musculus castaneus, Mus musculus molossinus, and Mus musculus musculus, while the duplicate seqs are removed.

Best, Junchao