tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
843 stars 226 forks source link

Annotation of IS element from E. coli #496

Closed quocviet0908 closed 4 years ago

quocviet0908 commented 4 years ago

Hi there I tried to annotate my E. coli contig for constructing the gene context. However, when I looked at my annotation result I notice that one kind of IS element was named as ISSbo1 instead of IS1294. ISSbo1 is originated from Shigella boydii while IS1294 from E. coli.

I extracted the sequence of that gene base on the .gff file and blastn-ed it with ISFinder and the result returned IS1294 as the highest score. While the result from the gff file is like this:

contig00051 Prodigal:002006 CDS 11247 11822 . + 0 ID=28420_4#72_03748;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISSbo1;locus_tag=28420_4#72_03748;product=IS91 family transposase ISSbo1

This is the command I used for prokka: prokka --kingdom Bacteria --genus Escherichia --species coli --outdir ./isolate_ID isolate_ID.fa --prefix isolate_ID --locustag isolate_ID --cpus 8 --force

Is there anything I could do to improve my annotation results? I really want to have the right annotation of IS1294 with its proper gene direction.

Thanks in advance.

tseemann commented 4 years ago

They are both in the IS91 family. Prokka does protein-based comparison and only annotates the tranposase. From your coordinates your CDS is < 500 bp long wherease the full gene is 389 aa (>1100nt), This means you have a partial gene only, so it probably can't really tell which one you have.

IS1294          --------------VCCTANQCWTSFLDAGGLRDIEVEAVTKMLACRTRILGVKEFGCDN
ISSbo1          MTRSGGDFQPRPLKRLFTANQCWTSFLDAGGLRDIGVEAVTKMLACGTRILGVKEYICDK
                                 ****************** ********** ********: **:

IS1294          PDCQHVKYLTNSCGSRACPSCGKKATDLWTATQLNRLPDCDWVHLVFTLPDTLWPVFESN
ISSbo1          PECPHVRYVTNSCGSRACPSCGKKATDLWIATQLNRLPDCDWVHLVFTLPDTLWPVFESN
                *:* **.*:******************** ******************************

IS1294          RWLLNDVCRLAVENLLYAARKRGLEPGIFCAIHTYGRRLNWHPHVHVSVTCGGLNKHGQW
ISSbo1          RWLLNDVCRLAVENLLYAARKRGQEPGIFCAIHTYGRRLNWHPHVHVSVTCGGLNKHGQW
                *********************** ************************************

IS1294          KKLSFLKDAMRSRWMWNMRQLLLKAWSEGMAMPESLSHITTESQWRSLVLKSGGKYWHVY
ISSbo1          KKLSFLKDSMRSRWMWNMRQLLLKAWSEGLAMPESLSHITTESQWRSLVLKAGGKYWHVY
                ********:********************:*********************:********

IS1294          MSKKTAGGRNTARYLGRYLKKPPIAASRLAHYNGGASLSFRYLDHKTGETATETLTQREL
ISSbo1          MSKKTAGGRNTARYLGRYLKKPPIAASRLAHYNVGASLNFRYLDHKTGETATETLTQREL
                ********************************* ****.*********************

IS1294          VARLKQHIPEKFFKMVRYFGFLANRVCGEKLPQVYRALGMDKPEPGRKCAMHKWMVKQFL
ISSbo1          VARLKQHIPEKFFKMVRYFGFLANRVCGEKLPQVYRALGMDKQEPVAKVCYAQ-MVKQFL
                ****************************************** **  * .  : ******

IS1294          SRDPFECVLCGCRMVYRRAIAGLNVSGLKKNARDISLLRYMPA
ISSbo1          SRDSFECVLCGGRMVYRRAIAGLNVEGLKKNARDISLLRYMPA
                ***.******* *************.*****************
quocviet0908 commented 4 years ago

@tseemann Thanks so much. I will look for others way to solve this.