jenfb / bkmr

Bayesian kernel machine regression
50 stars 19 forks source link

Run slowly when data was huge (rows of data over 10000). #7

Closed Cjjsmu closed 2 years ago

Cjjsmu commented 5 years ago

Dear author, Sorry to bother you! I met a trouble when runnig the bkmr codes. My data contains 28000+ rows. So it will crash and report a error as cann't allocate vector of ... Gb. After I modified the codes to handle big matrix and huge data, I discovered the speed of running codes was so slowly that the codes are not pratical. Thus, here, I want to consult you for whether you consider and handle this situation? What can you advise me to deal with it? I am looking forward to receiving your reply!

jenfb commented 5 years ago

The Gaussian predictive process (GPP) approach has been implemented with BKMR, which is a faster version that works with larger datasets. For GPP, you have to specify a matrix of knots (see the "knots" argument). However, I haven't tested the code for such a large dataset as you are working with, so even the GPP approach may not suffice.