hyattpd / Prodigal

Prodigal Gene Prediction Software
GNU General Public License v3.0
432 stars 85 forks source link

Fix exception to distance constraints in `dprog.c` #108

Open althonos opened 7 months ago

althonos commented 7 months ago

To restrict computation to a banded diagonal of the dynamic programming matrix the dprog function computes the offset of the furthest node to consider when scoring a node. In the case of giant ORFs, this limit must sometimes be increased to make sure the complete ORF is always accounted for. In the current code, this is done in a faulty way and causes connections to be sometimes recomputed for the entire contig, because min is always reset to zero in such cases.

I tested the patch with 50 genomes and the results were the same to the main branch (according to the scores file output); however on a large genome (Sorangium cellulosum) the patch saves 9s (40%) of runtime:

Benchmark 1: ./prodigal -i GCF_000067165.1_ASM6716v1_genomic.fna -o /dev/null
  Time (mean ± σ):     15.744 s ±  0.924 s    [User: 15.639 s, System: 0.069 s]
  Range (min … max):   14.556 s … 16.704 s    4 runs

Benchmark 2: ./prodigal_orig -i GCF_000067165.1_ASM6716v1_genomic.fna -o /dev/null
  Time (mean ± σ):     22.440 s ±  1.216 s    [User: 22.296 s, System: 0.076 s]
  Range (min … max):   21.249 s … 24.135 s    4 runs

Summary
  ./prodigal -i GCF_000067165.1_ASM6716v1_genomic.fna -o /dev/null ran
    1.43 ± 0.11 times faster than ./prodigal_orig -i GCF_000067165.1_ASM6716v1_genomic.fna -o /dev/null