vinisalazar commented 5 years ago

Hi, I'm trying to run Prodigal on some assembled genomes and I get this message:

Error: Sequence must be 20000 characters (only 13949 read). (Consider running with the -p meta option or finding more contigs from the same genome.)

Can you please elucidate what it means? I'm using complete whole genomes, wouldn't it be unadvisable to use the meta parameter?

Thank you for any assistance you can provide.

V

hyattpd commented 5 years ago

This is explained in more detail here: https://github.com/hyattpd/prodigal/wiki/Advice-by-Input-Type#plasmids-phages-viruses-and-other-short-sequences

Prodigal needs genes on which to train, preferably 100kb+ of sequence. A sub-20k genome doesn't have enough genes on which to gather data, so you are better off running in anonymous (meta) mode or collecting a large number of closely-related small genomes and training on a combined file of them in normal mode.

Despite the fact the precalculated "meta" clusters are all derived from bacteria, they'll still likely do a better job even on viral genomes than trying to self-train, as these files contain the full range of GC content, SD motifs, and thermophilic vs. non-thermophilic sequence biases. (Unless your sequence is a really weird genetic code, i.e. not 4, 11, or 25).

If you want to try self-training anyway, you can go into https://github.com/hyattpd/Prodigal/blob/GoogleImport/main.c

and change line 32:

define MIN_SINGLE_GENOME 20000

to some smaller number and recompile, but this isn't recommended.

regards, doug

vinisalazar commented 5 years ago

Thank you! This has helped a lot.

hyattpd / Prodigal

Error: Sequence must be 20000 characters (only 13949 read). #51

define MIN_SINGLE_GENOME 20000