Closed quocviet0908 closed 4 years ago
They are both in the IS91 family. Prokka does protein-based comparison and only annotates the tranposase. From your coordinates your CDS is < 500 bp long wherease the full gene is 389 aa (>1100nt), This means you have a partial gene only, so it probably can't really tell which one you have.
IS1294 --------------VCCTANQCWTSFLDAGGLRDIEVEAVTKMLACRTRILGVKEFGCDN
ISSbo1 MTRSGGDFQPRPLKRLFTANQCWTSFLDAGGLRDIGVEAVTKMLACGTRILGVKEYICDK
****************** ********** ********: **:
IS1294 PDCQHVKYLTNSCGSRACPSCGKKATDLWTATQLNRLPDCDWVHLVFTLPDTLWPVFESN
ISSbo1 PECPHVRYVTNSCGSRACPSCGKKATDLWIATQLNRLPDCDWVHLVFTLPDTLWPVFESN
*:* **.*:******************** ******************************
IS1294 RWLLNDVCRLAVENLLYAARKRGLEPGIFCAIHTYGRRLNWHPHVHVSVTCGGLNKHGQW
ISSbo1 RWLLNDVCRLAVENLLYAARKRGQEPGIFCAIHTYGRRLNWHPHVHVSVTCGGLNKHGQW
*********************** ************************************
IS1294 KKLSFLKDAMRSRWMWNMRQLLLKAWSEGMAMPESLSHITTESQWRSLVLKSGGKYWHVY
ISSbo1 KKLSFLKDSMRSRWMWNMRQLLLKAWSEGLAMPESLSHITTESQWRSLVLKAGGKYWHVY
********:********************:*********************:********
IS1294 MSKKTAGGRNTARYLGRYLKKPPIAASRLAHYNGGASLSFRYLDHKTGETATETLTQREL
ISSbo1 MSKKTAGGRNTARYLGRYLKKPPIAASRLAHYNVGASLNFRYLDHKTGETATETLTQREL
********************************* ****.*********************
IS1294 VARLKQHIPEKFFKMVRYFGFLANRVCGEKLPQVYRALGMDKPEPGRKCAMHKWMVKQFL
ISSbo1 VARLKQHIPEKFFKMVRYFGFLANRVCGEKLPQVYRALGMDKQEPVAKVCYAQ-MVKQFL
****************************************** ** * . : ******
IS1294 SRDPFECVLCGCRMVYRRAIAGLNVSGLKKNARDISLLRYMPA
ISSbo1 SRDSFECVLCGGRMVYRRAIAGLNVEGLKKNARDISLLRYMPA
***.******* *************.*****************
@tseemann Thanks so much. I will look for others way to solve this.
Hi there I tried to annotate my E. coli contig for constructing the gene context. However, when I looked at my annotation result I notice that one kind of IS element was named as ISSbo1 instead of IS1294. ISSbo1 is originated from Shigella boydii while IS1294 from E. coli.
I extracted the sequence of that gene base on the .gff file and blastn-ed it with ISFinder and the result returned IS1294 as the highest score. While the result from the gff file is like this:
contig00051 Prodigal:002006 CDS 11247 11822 . + 0 ID=28420_4#72_03748;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:ISfinder:ISSbo1;locus_tag=28420_4#72_03748;product=IS91 family transposase ISSbo1
This is the command I used for prokka: prokka --kingdom Bacteria --genus Escherichia --species coli --outdir ./isolate_ID isolate_ID.fa --prefix isolate_ID --locustag isolate_ID --cpus 8 --force
Is there anything I could do to improve my annotation results? I really want to have the right annotation of IS1294 with its proper gene direction.
Thanks in advance.