morris-lab / CellOracle

This is the alpha version of the CellOracle package
Other
310 stars 56 forks source link

(doc request) How do I use Bayesian ridge regression? #18

Closed concatenize closed 4 years ago

concatenize commented 4 years ago

I looked in help(oracle.get_links), but I only see documentation for bagging ridge regression. Is there a way I can specify Bayesian ridge regression instead? I'm asking because an example with 50x fewer cells than my actual data, 25x fewer clusters (test mode), and 4x fewer rounds of bagging (than the tutorial uses) currently takes 40-50 minutes to run (on a 2016 MacBook). I don't know if it all scales linearly, but if it does, that's 5000*45 minutes / (24*60 min/day) ≈ 156 days. And I'll need to pack up and move to grad school before then!

KenjiKamimoto-ac commented 4 years ago

As you know, we are using "oracle" class for GRN analysis. It process many kinds of data and do GRN calculations. But the core GRN algorithm that was used in the oracle class is "Net" class. https://morris-lab.github.io/CellOracle.documentation/modules/celloracle.html#celloracle.Net

In other words, the oracle class is just a wrapper to easily use Net class.

Although in the current documentation, we have not provide documentation how to use bayesian ridge in the oracle object, you can use "Net" class for bayesian ridge instead of oracle class. https://morris-lab.github.io/CellOracle.documentation/modules/celloracle.html#celloracle.Net

But I think using Net is more difficult than using oracle class. We will update celloracle documentation to provide more information about how to use bayesian ridge. Please wait for a while.

concatenize commented 4 years ago

OK, thank you! I will try it with Net, but if I can't figure it out, I will wait. I appreciate the fast response!

KenjiKamimoto-ac commented 4 years ago

Also, I have several suggestions to accelerate your calculation.

(1) Use fewer genes. In the tutorial, we recommends to filtering out non-variable gene and use top 2000-3000 genes. The number of gene strongly affects calculation time.

(2) Use multi-core computer The calculation is performed in parallel if you use multi-core CPU. If you can use computer / computational cluster that have large CPU core, the calculation will take less time. One good option might be using cloud computing service such as AWS or google cloud computing.