hyattpd / Prodigal

Prodigal Gene Prediction Software
GNU General Public License v3.0
432 stars 85 forks source link

Does training prodigal on multiple genomes instead of just one increase accuracy? #87

Closed avilaHugo closed 1 year ago

avilaHugo commented 3 years ago

Dear developers,

I'm using prokka to re-annotate some public genomes. My organism of interest has over 600 complete refseq genomes, so I was trying to use lavish training mode to create a training file with all of them. I received the error mentioned in these issues (https://github.com/hyattpd/Prodigal/issues/54) (https://github.com/hyattpd/Prodigal/issues/54) and I followed the steps mentioned (comment the limit and recompile the code) but after performing this process I had some doubts. The first question is whether using several complete genomes instead of just one would increase the accuracy of the training process? Is the string length limit of 32000000bps a memory allocation decision or is it an experimental training accuracy limit, as above that limit there is no significant benefit?

avilaHugo commented 3 years ago

Btw i try to run with 600 genomes after editing the MAX_SEQ and got core dumped.