Open karchjd opened 5 years ago
Related discussion: https://github.com/mllg/batchtools/issues/222
@jakob-r thanks for pointing me to this indeed very relevant discussion. Seems the best approach, for now, is to either use another package (for example, clustermq) or manually chunk many jobs. Adding a chunking option which also chunks the result files seems like a worthwhile feature.
Adding a chunking option which also chunks the result files seems like a worthwhile feature.
Yes, this is on my todo, but I'm terribly busy with other projects at the moment.
I am using batchtools to run a simulation study for which I have a lot (18 million) of very short jobs. I chunk those jobs into 100 jobs, which makes them run quite fast on our cluster (around 24hs for all jobs). The bottleneck now seems to be gathering all results, which I do using the following code
and disabled the progress bar.
I ran a smaller version of the same experiment with 1 million jobs. Gathering the results took like one hour using one of the nodes of our cluster. So, my estimate is that it will take 18 hours to gather the results of the 18 million jobs. I tried moving the registry to my local SSD but this seemed slower.
I have two result types, depending on which algorithm is used. They are around 370 and 470 bytes big.
My guess is that it is so slow because there are so many small results file. So, I would predict that saving all results of one chunk in one file, would make gathering the results substantially faster. Is this possible?