Closed ebraginngd closed 5 hours ago
Just to add here, when I tried the online web tool it did annotate this entry with gene=ompC
, however I see that both the software and db versions are different:
https://bakta.computational.bio/job/eyJqb2JJRCI6IjA2YjI2NmZiLWRiZGQtNGY5My05YzFkLTFiYWUwNTEwYWM3YiIsInNlY3JldCI6Ikg3SUhrVDFhRzQ1Qk1vREVKODJybzdMT3ltZ29TSE5ZQVRXTjBNUmdFNmcifQ==
Hi @ebraginngd , thanks for reaching out with this. In principle, and based on the information provided above, this is not a bug, but just the occurence of two different genes having a fairly equal functional description.
As you can see in the Dbxrefs, the first is a member of the UniRef50_P06996
protein cluster that is annotated with a gene symbol ompC
,
while the second is a member of the UniRef50_Q56828
protein cluster without any gene symbol annotation.
Since these are members of two different UniRef50 clusters, we can assure, that these have a mutual sequence identity of max 50 % - which is fairly low. Hence, It could simply be the case, that these are in fact two different genes.
... OR it could simply be the very common case, that one protein cluster in UniRef is better annotated than others.
I hope this helps to clarify this a bit. If not, please do not hesitate to keep asking.
Dear @oschwengers thanks very much for amazing annotation tool. I wonder if the following is a bug or maybe we are using it wrong:
We tried annotating an E.coli assembly with the following command:
bakta --db /mnt/db-full/ -o /mnt/out_path -t 2 assembly.fasta
with the latest docker imageoschwengers/bakta:latest
We see some genes have
gene=
short names and some don't, is there way to enforce short names? Interestingly the same gene (Outer membrane porin C) of which there are two slightly different copies, one was annotated with the short name:contig_104 Prodigal CDS 2607 3707 . - 0 ID=LIIJEP_24530;Name=Outer membrane porin C 2;locus_tag=LIIJEP_24530;product=Outer membrane porin C 2;Dbxref=COG:COG3203,COG:M,GO:0009279,GO:0015288,GO:0034220,GO:0046930,KEGG:K16076,RefSeq:WP_000768393.1,SO:0001217,UniParc:UPI00016A10FE,UniRef:UniRef100_A0A0D8WD33,UniRef:UniRef50_P06996,UniRef:UniRef90_A0A4P7TME1;gene=ompC2
and one without:
contig_2 Prodigal CDS 40084 41178 . + 0 ID=LIIJEP_01180;Name=Outer membrane porin C;locus_tag=LIIJEP_01180;product=Outer membrane porin C;Dbxref=RefSeq:WP_000865539.1,SO:0001217,UniParc:UPI00000B81BE,UniRef:UniRef100_Q9K597,UniRef:UniRef50_Q56828,UniRef:UniRef90_Q9K597
Sample in question is DRR387971