tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
843 stars 226 forks source link

A very large gene does not get annotated #562

Closed AnRibo closed 3 years ago

AnRibo commented 3 years ago

I have a very large gene that does not get annotated. Prokka will rather choose to annotate a hypothetical gene in the opposite strand, even though there are no start codons there. When BLASTing this hypothetical gene I find nothing. Other software finds my gene with no issues. Adding the gene with --proteins does not help. --proteins does work on other genes. The problem could be that there are several thousand bp with no stop codons in reverse strand, but adding the gene with --proteins should take priority?

ireneortega commented 3 years ago

I think the option --proteins just annotates each gene after it is localised. I mean, PROKKA first looks for nucleotides that could correspond to a gen, and, then, it uses the annotations of the reference proteins to describe=annotate that gen. I think the option --proteins does nothing in identifying your large gene.

andersgs commented 3 years ago

@AnRibo what is your organism, and what is very large?

--proteins (as alluded by @ireneortega) is used to lift annotations from one genome to another after the CDS regions have been found.

prokka uses prodigal as its main CDS finding engine. You can try to supply prokka with a training file for your genome, which includes the gene:

https://github.com/hyattpd/prodigal/wiki/Gene-Prediction-Modes#training-mode

And, then specify the training file in prokka with --prodigaltf:

https://github.com/tseemann/prokka#command-line-options

And:

https://github.com/tseemann/prokka#option---prodigaltf

AnRibo commented 3 years ago

Thank you very much @ireneortega and @andersgs !

andersgs commented 3 years ago

@AnRibo did it work?