GBM utilizing only small amount of compute capacity

ebweinstein commented 10 years ago

Hi, First, thank you for your great work!

Running R Studio and GBM on my 600,000 x 10 data set with cross validation = 5 takes close to 24 hours on my i5 8GB desktop. So I thought I would try Amazon AWS.

I started up an AWS Windows instance with 32 cores and 60 GB of RAM to run the same GBM job mentioned above - but it does not seem to be running much quicker. It seems to only be using mainly 4 or 5 cores (which, I guess is because each cross validated set is sent to one and only one core), while the other 27 are sitting there doing nothing. It is also only using 14 GB out of the 60 GB of available memory.

How can I use the full capacity of my 32 core, 80 GB RAM server?

Neil-Schneider commented 10 years ago

Your guess is correct. Each cross validation set is sent to one and only one core. GBM is short for Gradient Boosting Machine. Boosting means each tree is built off of the residuals of the previous set of trees. Therefore, the individual trees can not be ran in parallel.

Currently, the best methods for you to reduce the run time would be to increase the shrinkage and decrease the number of trees or reduce the train.fraction/bag.fraction parameters.

harrysouthworth commented 10 years ago

You might want to take a look at the gbt package on github. I haven't properly looked at it yet but think the tree building is parallelized. Only Gaussian and binomial deviqnces are implemented. I guess parallel trees is only useful if you're allowing fairly high interactions tho. On 10 Apr 2014 04:00, "Neil-Schneider" notifications@github.com wrote:

Your guess is correct. Each cross validation set is sent to one and only one core. GBM is short for Gradient Boosting Machine. Boosting means each tree is built off of the residuals of the previous set of trees. Therefore, the individual trees can not be ran in parallel.

Currently, the best methods for you to reduce the run time would be to increase the shrinkage and decrease the number of trees or reduce the train.fraction/bag.fraction parameters.

Reply to this email directly or view it on GitHubhttps://github.com/harrysouthworth/gbm/issues/17#issuecomment-40038570 .

ebweinstein commented 10 years ago

Thanks!

ebweinstein commented 10 years ago

In reading the thesis on gbt, the model building speed increase is done "As the computations for each feature - trying all the possible splits for the features while computing their costs and best constants - are independent, the idea was to execute them in parallel."

http://gradientboostedmodels.googlecode.com/files/report.pdf

The thesis claims gbt was able to build the same 3 interaction depth model in 1/3 the time of gbm!

I look forward to your review of gbt Harry and am clearly hesitant to use an newer less tested process.. Have you thought about parallelizing in the fashion above?

harrysouthworth commented 10 years ago

Alexandre wrote his tree and gb engines from scratch because the gbm code was somewhat unmanageable. I'm no kind of C++ programmer and wouldn't know where to start. Hopefully, gbt is a lot simpler and can be tidied up and optimized more easily than gbm.

ebweinstein commented 10 years ago

I think there may be a memory leak issue in when running on Ubuntu Linux. Whether I have 8gb or 80gb of ram, r eventually uses all the memory and then crashes when I run a gbm model on a large data set of 600,000 x 200. Windows on the other hand for the same model will only use 14gb of ram but I haven't ran the entire job on windows to know if it will really run without issue.

*update - the job did successfully run on windows.

az0 commented 10 years ago

When doing k-fold CV, GBM builds the k-folds in parallel and them separately builds the final model, but could it do them all in one pass if there are k+1 cores?

harrysouthworth commented 10 years ago

I think so. Well spotted. I'll add it to the neverending todo list. On 9 Sep 2014 17:14, "Andrew Ziem" notifications@github.com wrote:

When doing k-fold CV, GBM builds the k-folds in parallel and them separately builds the final model, but could it do them all in one pass if there are k+1 cores?

— Reply to this email directly or view it on GitHub https://github.com/harrysouthworth/gbm/issues/17#issuecomment-54994811.

az0 commented 10 years ago

@harrysouthworth Thank you. This will make it twice as fast and could save me days of waiting. :)

harrysouthworth / gbm

GBM utilizing only small amount of compute capacity #17