vdorie / bartCause

Causal Inference using Bayesian Additive Regression Trees
77 stars 10 forks source link

Does the package scale with a large number of variables and observations? #3

Open ferlocar opened 5 years ago

ferlocar commented 5 years ago

Hi Vincent,

Thanks again for the great package. I'm using it to estimate the average treatment effect of an online recommendation. However, the data set is (somewhat) large: it has 70K observations and 30 variables. The data in itself is only 20 MB, but when I'm using the bartc function, it takes several minutes to compute and returns an object of 6.7 GB. I'm thinking about using data sets much larger than this, and so I'm worried that the package will not work well for that.

So, I'm wondering, have you tried the package with data sets as large as this one? If so, did you have similar issues? How did you solve them?

Thanks in advance for your help!

vdorie commented 5 years ago

Do you have the latest dbarts from the repository? I've been toying with various ways of keeping trees from BART models and that can cause the memory demands to increase by a lot. If they're not being saved (the current default), the memory required should be much, much smaller.

Saving them was a default at some point, I can't remember when. You can guarantee that they're not by passing in keepTrees = FALSE.

ferlocar commented 5 years ago

Thanks, I'll do that!

However, the latest release for dbarts was recently removed from CRAN due to installation issues. Check it out here: https://cran.r-project.org/web/packages/dbarts/index.html

So, I don't feel confident about installing the latest release. Are you sure it should work just fine?

vdorie commented 5 years ago

Yeah, it was taken down due to one member of CRAN's insistence that everything run on Solaris, something with less than 0.3% of the install base for PCs. It's pretty hard to replicate Solaris problems since I don't have access to one of those machines.

The current version works perfectly on Windows, OS X, and numerous flavors of Linux. What's broken on Solaris are additional optimization features, which only make it run faster.

ferlocar commented 5 years ago

Sounds like a pain in the ***, sorry to hear that. Your packages are great.

Thanks for the advice, I'll download the latest version then. Thanks for your hard work and for being so responsive, I really appreciate it.