Open leonjessen opened 5 years ago
Hi Leon, Apologies for the sparse info, I'll see to that as soon as I have time. As far as I can see your example should work fine run on 30 cores (but please let me know if it doesn't!). For smaller jobs it often takes longer to set up the cluster and distribute the processes than it does to just run the job on a single core. But in your case training a model on over 100k sequences it's probably best to use as many cores as you have access to. Of course this assumes that you have sufficient memory to parallelize to this extent, so to start with I would monitor the usage with vmstat or something similar.
Note that you can also set cores
to > 1 for derivePHMM()
too if you are finding this step is taking a while.
I hope that helps, as I say please let me know if you run into issues.
Cheers, Shaun
Hi @shaunpwilkinson, It worked nicely - Thanks for input! I have access to up to 1TB of mem, so I'm all good. It took roughly 10 hours to run! And great package - Let me know if you need some testing :+1:
Hi @shaunpwilkinson ,
I've been reading your post on aphid, but it's sparse on information on how to train using multiple cores.
I have a rather large set of sequences (+100k) and 32 cores, which I'd like to utilise - Could you give an example on how to minimise training time as much as possible? Would the below be a valid workflow and do the trick?
Cheers, Leon