nigyta / dfast_core

DDBJ Fast Annotation and Submission Tool
77 stars 14 forks source link

DFAST cannot predict the dnaA gene in a bacterial genome #22

Closed ywangbio closed 4 years ago

ywangbio commented 4 years ago

I have tried several bacterial genomes, and found that DFAST cannot predict the dnaA gene in a bacterial genome when using Prokka. However, independent Prokka could do it without any problem. What happened with the Prokka in the DFAST software?

nigyta commented 4 years ago

Thank you for trying DFAST

Is the dnaA gene located at the first position in your genome? MetageneAnnotator (MGA), which is a default gene prediction tool of DFAST, inspects upstream regions of ORFs. Therefore, genes located at the first position of the contig are sometimes missing, or annotated as partial genes, which will be removed during the subsequent process.

I recommend to set the first position of the sequence at the upstream position from dnaA (~100bp). You can do it easily with the DFAST web version.

Enable "Rotate/flip the chromosome so that the dnaA gene comes first" option in the Advanced option

Another recommendation is to use prodigal.

dfast -g genome.fna --use_prodigal

Make sure that prodigal is installed and is included in your PATH.

If this happens again, invoke dfast with --debug option, which will keep intermediate files. Please find the result of MGA in the StructuralAnnotation directory.

ywangbio commented 4 years ago

I am sorry that I made a mistake in my question. It is prodigal but not Prokka in DFAST. I have already tried MGA method, MGA could predict dnaA gene properly.

Also, I compared the gene lists predicted by MGA and prodigal using DFAST. I found MGA predicted fewer hypothetical proteins than prodigal but lost several genes when compared with prodigal.

So, I think the best way to use DFAST is using MGA to predict the first gene dnaA and using prodigal to predict other genes. However, Prokka does not have such a problem.

ywangbio commented 4 years ago

I also tried dfast with --debug option, the dnaA gene was not predicted.

nigyta commented 4 years ago

Well, this seems specific to your data. If you can share the genome, please create a job using the web version. I will look into it.

ywangbio commented 4 years ago

Yes, the webserver of DFAST predicted the dnaA gene correctly at any condition (with or without the setting of offset).

Without the setting of offset: Job Title : N_halophilus prediction Job ID : 854ca2d3-391a-4122-b7a8-0adc69c88a05 Submitted at :2019-12-04 09:01:29.257414.

With the setting of offset: Job Title : N_halophilus prediction Job ID : 35dca14b-1d18-4aa7-a418-2875880d0889 Submitted at :2019-12-04 09:16:36.528528.

Although the DFAST-core cannot predict the dnaA gene without the setting of offset 100, it do correctly predict the dnaA gene with the setting of offset 100. Perfect!

Thank you so much for your detailed instruction. I really like DFAST-core. It is so convenient for both prediction of genes and submission to a database.

nigyta commented 4 years ago

Thank you for letting me know the result. If you have any trouble, feel free to ask again.