Open symPiotr opened 9 years ago
Bacteria: The code is used in Entomoplasmatales and Mycoplasmatales (Bove et al. 1989). The situation in the Acholeplasmatales is unclear. Based on a study of ribosomal protein genes, it had been concluded that UGA does not code for tryptophan in plant-pathogenic mycoplasma-like organisms (MLO) and the Acholeplasmataceae (Lim and Sears, 1992) and there seems to be only a single tRNA-CCA for tryptophan in Acholeplasma laidlawii (Tanaka et al. 1989). In contrast, in a study of codon usage in Phytoplasmas, it was found that 30 out of 78 ORFs analyzed translated better with code 4 (UGA for tryptophan) than with code 11 while the remainder showed no differences between the two codes (Melamed et al. 2003). In addition, the coding reassignment of UGA Stop --> Trp can be found in an alpha-proteobacterial symbiont of cicadas: Candidatus Hodgkinia cicadicola (McCutcheon et al. 2009).
@piotrlukasik Piotr: I think the error message is from within BioPerl, but I need to dig deeper. Can you please send the TEST.fasta file to torsten.seemann@gmail.com or just drag it into the comment here on github?
The problem is that when the total bp is under 100,000 Prokka puts prodigal in "meta" mode.
This seems to have issues with different genetic codes.
Prodigal shouldn't be run with meta mode if you know the genome (you can't specify a genetic code with meta mode). The correct way to run a draft genome is not to submit a contig at a time, but to submit a multiple FASTA containing all the contigs (which Prodigal will train on and produce good results). This is covered in detail here:
https://github.com/hyattpd/prodigal/wiki/Advice-by-Input-Type
I suppose in a future version I could allow the specification of a genetic code for meta/anonymous mode (and tell it to skip the canned training files that don't match the genetic code). You would still get worse results running individual small contigs through metagenomic/anonymous mode compared to just putting all the contigs of a single genome in one file and using the default mode.
Hi, thank you for this! I guess that my problem was due to exceptional nature of my study organism: it does use an alternative genetic code, and the complete genome of some strains is less than 100kb. I don't think that many other people face this issue... Before running prokka on another set of similar genomes, I will make sure to read through advice about Prokka modes carefully!
You can run it using Prodigal with default settings if it's less than 100KB and specify the genetic code. <100KB isn't ideal since it's not as much sequence to train on, but it will still work. <20KB is the threshold at which the program refuses to run w/o a training file or metagenomic mode.
Hi, I am working with very small bacterial genomes, and when trying to annotate the smallest of them using the latest Prokka I came upon a curious error.
Contigs above a certain length (120kb is OK) annotate correctly. But when the total length of contig/contigs in the input fasta file is below some threshold (75kb is too small), Prokka produces a series of warning messages:
--------------------- WARNING ---------------------
MSG: Seq [Contig_ID]: Not using a valid terminator codon!
...and seems to change the genetic code, leading to incorrect annotation. However, increasing the size of the input fasta file - either by duplicating all contigs, or by attaching a long polyA tail to one of the sequences - fixes the problem: annotation is then correct. Likewise, if the 120kb contig which annotates correctly is cropped so that only half remains, the problem described above appears.
This is the command that I have been using: prokka1 --force --outdir /my_path/TEST_annotation --prefix XXX --gcode 4 --kingdom Bacteria --rfam --addgenes --locustag XXX TEST.fasta
Let me know if you would like to see the input/output files.
And thanks for maintaining this great software!
Piotr