hyattpd / Prodigal

Prodigal Gene Prediction Software
GNU General Public License v3.0
432 stars 85 forks source link

gene prediction labeled as partial if it starts at contig edge #15

Closed VidJa closed 8 years ago

VidJa commented 8 years ago

This might actually be behaviour 'as designed' but I noticed a gene gets qualified as 'partial' even if the full sequence is there when it starts at the edge of a contig. I noticed this after searching for dnaA and rearranging a circular chromosome to start at dnaA. Maybe a flag for circularity would help such that prodigal would know that the promoter of the first gene might be at the 'end' of given sequence

prediction on the raw contig: dnaA sits somewhere in the middle: unitig_0_quiver Prodigal_v2.6.2 CDS 872920 874320 357.5 + 0 ID=1_808;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.633;conf=99.99;score=357.50;cscore=357.42;sscore=0.08;rscore=-3.89;uscore=0.47;tscore=4.15;

prediction on the rearranged contig where dnaA starts at first nucleotide. Chrom1 Prodigal_v2.6.2 CDS 1 1401 360.8 + 0 ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.633;conf=99.99;score=360.75;cscore=357.54;sscore=3.22;rscore=0.00;uscore=3.22;tscore=0.00;

Both predictions contain the exact same sequence (of course)

hyattpd commented 8 years ago

Prodigal has a "-c" option to prevent partial genes from being predicted (suitable for finished genomes).

The default ("draft") behavior is deliberate, as careful examination of draft/metagenome datasets revealed most genes with start codons at base 1 are still partial (and the true start codon is missing).