tseemann / barrnap

:microscope: :leo: Bacterial ribosomal RNA predictor
GNU General Public License v3.0
210 stars 41 forks source link

Mycobacteria & 5S rRNA #36

Open ppgardne opened 5 years ago

ppgardne commented 5 years ago

Hi Torsten,

Great job on a wonderful tool. ;-)

Have just noticed that it seems to miss some of the TB 5S rRNA sequence:

[barrnap] Found: 5S_rRNA AL123456; L=76/119 5..80 + 5S ribosomal RNA (partial)
[barrnap] Found 1 ribosomal RNA features.
[barrnap] Sorting features and outputting GFF3...
##gff-version 3
AL123456;   barrnap:0.9 rRNA    5   80  3.5e-14 +   .   Name=5S_rRNA;product=5S ribosomal RNA (partial);note=aligned only 63 percent of the 5S ribosomal RNA
[barrnap] Done.

RUN ON:
>TB-5S-rRNA    AL123456; Mycobacterium tuberculosis H37Rv complete genome.
UUACGGCGGCCACAGCGGCAGGGAAACGCCCGGUCCCAUUCCGAACCCGG
AAGCUAAGCCUGCCAGCGCCGAUGAUACUGCCCCUCCGGGUGGAAAAGUA
GGACACCGCCGAACA

Can you reproduce this? I built a HMMer model on an older Rfam RF00001 SEED alignment and that seems to work pretty well. There are definitely Mycobacteria sequences in the seed, so it should work well.

E.g.

>> AL123456;  Mycobacterium tuberculosis H37Rv complete genome.
    score  bias    Evalue   hmmfrom    hmm to     alifrom    ali to      envfrom    env to       sq len      acc
   ------ ----- ---------   -------   -------    --------- ---------    --------- ---------    ---------    ----
 !   53.5  10.9   1.4e-18         4       116 ..         5       110 ..         2       114 ..       115    0.92

  Alignment:
  score: 53.5 bits
                <<<<<<....<<.<<<<<...<<..<<<<<<.......>>..>>>>..>>....>>>>>..>><<<.<<....<.<<.....<<....>>.....>>.>. CS
       SEED   4 ggcggccauagcgggggggaaacacccgauccCaUcccGaacucggaaguuAAgccccuuagcgccgauguagUAcugcggugggugaccacgugggAau 103
                ggcggcca agcgg  gggaaac cccg uccCaU+ccGaac cggaag uAAgcc+  +agcgccgaug   UAcugc        +cc  gugg Aa 
  AL123456;   5 GGCGGCCACAGCGGCAGGGAAACGCCCGGUCCCAUUCCGAACCCGGAAGCUAAGCCUGCCAGCGCCGAUG--AUACUGCCC-----CUCCGGGUGGAAA- 96 
                789*******************************************************************..6****9874.....3578889999999. PP

                .>>.>>>.>>>>>> CS
       SEED 104 aguaggu.gcugcc 116
                aguagg+  c+gcc
  AL123456;  97 AGUAGGAcACCGCC 110
                ***99988888876 PP

Thanks for your time! Paul & Helena.