probcomp / crosscat

A domain-general, Bayesian method for analyzing high-dimensional data tables
http://probcomp.csail.mit.edu/crosscat/
Apache License 2.0
322 stars 42 forks source link

Is it possible to engage more than one core? #113

Closed kayhan-batmanghelich closed 7 years ago

kayhan-batmanghelich commented 7 years ago

Hi

I was wondering if it is possible to engage more than one core to speed up MCMC? I can run each chain independently but I was wondering if there is any other way.

Thanks, Kayhan

fsaad commented 7 years ago

Yep, please use MultiprocessingEngine, which accepts a cpu_count

https://github.com/probcomp/crosscat/blob/master/src/MultiprocessingEngine.py#L36

kayhan-batmanghelich commented 7 years ago

Thanks. Do you think CrossCat can handle a matrix of 20,000x150 ? The website said it has applied for a bigger size but do you have rough idea how long it may take?

Thanks, Kayhan

fsaad commented 7 years ago

I believe that this version of CrossCat is able to handle 20K rows and 150 columns, especially if the data is all numerical. You may wish to downsample the rows first, say to 2K, and investigate the properties/runtime of the sampler, and start doubling up the number of rows.

For very large analyses, you may consider using Loom, which is an open-source implementation of CrossCat that targets datasets on the order of billions of cells.

kayhan-batmanghelich commented 7 years ago

@fsaad Thank you for the quick reply.

Yes, it all numerical. So far it has been running for 12 hours (one chain) and has not converged. I will try both. Does Loom allow sampling and computing logP (as it is supported in CrossCat). CrossCat is also open-source, what is the difference? Licenses are different?

fsaad commented 7 years ago

So far it has been running for 12 hours (one chain) and has not converged

How are you determining convergence here?

CrossCat is also open-source, what is the difference? Licenses are different?

The key difference is that Loom is a highly optimized implementation which can scale to billions of cells. Please refer to the Loom repository for more information about it.