I'm using prokka to re-annotate some public genomes. My organism of interest has over 600 complete refseq genomes, so I was trying to use lavish training mode to create a training file with all of them. I received the error mentioned in these issues (https://github.com/hyattpd/Prodigal/issues/54) (https://github.com/hyattpd/Prodigal/issues/54) and I followed the steps mentioned (comment the limit and recompile the code) but after performing this process I had some doubts. The first question is whether using several complete genomes instead of just one would increase the accuracy of the training process? Is the string length limit of 32000000bps a memory allocation decision or is it an experimental training accuracy limit, as above that limit there is no significant benefit?
Dear developers,
I'm using prokka to re-annotate some public genomes. My organism of interest has over 600 complete refseq genomes, so I was trying to use lavish training mode to create a training file with all of them. I received the error mentioned in these issues (https://github.com/hyattpd/Prodigal/issues/54) (https://github.com/hyattpd/Prodigal/issues/54) and I followed the steps mentioned (comment the limit and recompile the code) but after performing this process I had some doubts. The first question is whether using several complete genomes instead of just one would increase the accuracy of the training process? Is the string length limit of 32000000bps a memory allocation decision or is it an experimental training accuracy limit, as above that limit there is no significant benefit?