hyattpd / Prodigal

Prodigal Gene Prediction Software
GNU General Public License v3.0
441 stars 85 forks source link

Fix typo in `score_nodes` function of `node.c` #88

Closed althonos closed 3 years ago

althonos commented 3 years ago

Hi there!

The latest version of the code contained a typo that was causing the coding penalization to be applied even on large genes of the reverse strand.

donovan-h-parks commented 1 year ago

Does this bug impacts the last release of Prodigal (v2.6.3; Feb, 2016) or is this a bug that has been introduced more recently? Just wondering how concerned I should be about this bug as we use Prodigal extensively. Is there any sense of how much fixing this bug changes the results produced by Prodigal?

althonos commented 1 year ago

The bug is present in v2.6.3, yes. It affects the gene prediction in its entirety, so some genes may be predicted with different coordinates, other genes may not be predicted at all.

To get a patched version, you have to recompile the code yourself, as there was no release made since then. Otherwise consider checking Pyrodigal, which has the fix included, as well as some more performance improvements.

donovan-h-parks commented 1 year ago

Thanks for the bug fix, further information, and putting together Pyrodigal. We will look to move over to Pyrodigal.

donovan-h-parks commented 1 year ago

Hi @althonos, with the bug fix do you expect to see fewer or more genes being predicted? We're noticing that the fix result in marker genes that were previously being identified no longer being seen. We are digging deeper, but just hoping you can tell us if the expected result if for less genes to be predicted.

althonos commented 1 year ago

Hi Donovan, the two fixes both affect the scoring penalization for some nodes while in metagenomic mode, which was before then applied to all start/stop codons on the reverse strand instead of those for "small genes" only (<120bp). I can't say consistently, but i've seen results where two small genes on the forward strand were not predicted anymore and replaced by a single larger gene on the reverse strand, since the fix now made it score higher. This is very circumstantial though.

donovan-h-parks commented 1 year ago

Thanks - we'll dig in.