hyattpd / Prodigal

Prodigal Gene Prediction Software
GNU General Public License v3.0
433 stars 85 forks source link

Can I use a training file for metagenomic contigs? #42

Closed ohmiya closed 6 years ago

ohmiya commented 6 years ago

Hello,

Right now, I am trying to predict viral ORFs from the metagenomic viral contigs using Prodigal. However, as you know, Prodigal does not know viral genome specific rules, so I would like to estimate viral genes using information I trained other similar viral metagenomic contigs in advance. However, Prodigal does not seem to accept the meta mode with training files. Can I predict the ORFs using the normal mode?

hyattpd commented 6 years ago

No, Prodigal does not have this capability at the moment (by definition, metagenomic mode uses a preset list of training files).

This is a difficult problem, as viruses often have too little sequence to train on effectively. The general rule for Prodigal training files is they should be one organism, or similar enough organisms that they use the same rules.

You can currently approximate metagenomic behavior by creating training files and running Prodigal in normal mode with each training file, then finding the genes with the highest scores among all the outputs. (This is close to what the metagenomic mode actually does).

Maybe in a subsequent version, I will allow the user to provide multiple training files in metagenomic mode, and it can add them to the current "canned" training files it goes through for metagenomic mode.