Closed vinisalazar closed 5 years ago
This is explained in more detail here: https://github.com/hyattpd/prodigal/wiki/Advice-by-Input-Type#plasmids-phages-viruses-and-other-short-sequences
Prodigal needs genes on which to train, preferably 100kb+ of sequence. A sub-20k genome doesn't have enough genes on which to gather data, so you are better off running in anonymous (meta) mode or collecting a large number of closely-related small genomes and training on a combined file of them in normal mode.
Despite the fact the precalculated "meta" clusters are all derived from bacteria, they'll still likely do a better job even on viral genomes than trying to self-train, as these files contain the full range of GC content, SD motifs, and thermophilic vs. non-thermophilic sequence biases. (Unless your sequence is a really weird genetic code, i.e. not 4, 11, or 25).
If you want to try self-training anyway, you can go into https://github.com/hyattpd/Prodigal/blob/GoogleImport/main.c
and change line 32:
to some smaller number and recompile, but this isn't recommended.
regards, doug
Thank you! This has helped a lot.
Hi, I'm trying to run Prodigal on some assembled genomes and I get this message:
Error: Sequence must be 20000 characters (only 13949 read). (Consider running with the -p meta option or finding more contigs from the same genome.)
Can you please elucidate what it means? I'm using complete whole genomes, wouldn't it be unadvisable to use the meta parameter?
Thank you for any assistance you can provide.
V