Closed kayhan-batmanghelich closed 7 years ago
Yep, please use MultiprocessingEngine, which accepts a cpu_count
https://github.com/probcomp/crosscat/blob/master/src/MultiprocessingEngine.py#L36
Thanks. Do you think CrossCat can handle a matrix of 20,000x150 ? The website said it has applied for a bigger size but do you have rough idea how long it may take?
Thanks, Kayhan
I believe that this version of CrossCat is able to handle 20K rows and 150 columns, especially if the data is all numerical. You may wish to downsample the rows first, say to 2K, and investigate the properties/runtime of the sampler, and start doubling up the number of rows.
For very large analyses, you may consider using Loom, which is an open-source implementation of CrossCat that targets datasets on the order of billions of cells.
@fsaad Thank you for the quick reply.
Yes, it all numerical. So far it has been running for 12 hours (one chain) and has not converged. I will try both. Does Loom allow sampling and computing logP (as it is supported in CrossCat). CrossCat is also open-source, what is the difference? Licenses are different?
So far it has been running for 12 hours (one chain) and has not converged
How are you determining convergence here?
CrossCat is also open-source, what is the difference? Licenses are different?
The key difference is that Loom is a highly optimized implementation which can scale to billions of cells. Please refer to the Loom repository for more information about it.
Hi
I was wondering if it is possible to engage more than one core to speed up MCMC? I can run each chain independently but I was wondering if there is any other way.
Thanks, Kayhan