hyattpd / Prodigal

Prodigal Gene Prediction Software
GNU General Public License v3.0
432 stars 85 forks source link

Faster prodigal. #94

Closed jaebeom-kim closed 2 years ago

jaebeom-kim commented 2 years ago

Hi, I'm Jaebeom Kim and developing a metagenomic classifier, Metabuli. I'm using prodigal to predict genes of thousands of prokaryotic genomes, so I'm working on making prodigal faster.

Here are some modifications. 1) Rapid mode. In the function named 'dprog', I found a suspicious part. I think lines 52 and 53 are making the program slow. When i==999 and MAX_NODE_DIST==500, 'min' becomes 0 in line line 54, which leads to calling 'score_connection' 999 times. I'm not sure it is intended or not. So I just make an 'Rapid mode' to jump the lines. When I tested with E.coli genome (GCF_000008865.2), Rapid mode decreased the running time (training + prediction) from ~9.9 sec to ~6.5 sec, while producing the same results.

2) MAX_NODE_DIST While reviewing the source code, I found that reducing MAX_NODE_DIST in dprog.h can decrease running time. So, I decreased it from 500 to 300 and tested it using E.coli genome. The running time (training + predicting) decreased from ~9.7 sec to ~7.2 sec, while producing the same result. But still, it may lead to prediction of lower quality in other cases, so I just made it as an option for users who want to get results faster.

When rapid mode is used with MAX_NODE_DIST 300, the running time was 5.5 sec

I tried to follow the code style :) If you like the changes, please accept this PR and update the conda package as well.