hoelzer-lab / ribap

A comprehensive bacterial core gene-set annotation pipeline based on Roary and pairwise ILPs
GNU General Public License v3.0
25 stars 4 forks source link

Chunk size #49

Open hoelzer opened 1 year ago

hoelzer commented 1 year ago

We have a new --chunk parameter to split the ILP corpus for faster parallel computing.

However, when the chunk size is too large concerning the number of input genomes, RIBAP crashes. E.g., I tried --chunks 80 for eight input genomes: crash.

We could add a check and warning. Or even better: we automatically adjust the chunk size when the user is defining something to high in comparison to the input genomes (not sure what would be a good formula here... e.g. --chunks 200 for 167 Klebsiella was fine, ...)

klamkiew commented 1 year ago

I think the formula is number of pairwise comparisons == upper limit for --chunks E.g., 8 input genomes lead to 28 pairwise comparisons, meaning it doesn't make sense to have more than 28 chunks. Not sure how / when to tell NF this though ;)