harrysouthworth / gbm

Gradient boosted models
Other
106 stars 27 forks source link

Memory leak with "laplace" still present? #11

Closed johnrolfeellis closed 9 years ago

johnrolfeellis commented 10 years ago

I'm experiencing a significant memory leak in 2.1-0.3 with the "laplace" distribution, similar to what is described here:

https://code.google.com/p/gradientboostedmodels/issues/detail?id=32

The file laplace.cpp was fixed at line 97 in 2.1-0.3 to address this. But though I'm running 2.1-0.3 (at least that's what "library (gbm)" says), I'm stilling getting a memory leak.

The training set has 280K rows with 11 columns, and the parameters are:

gbm.fit (trainingSet, outcomes, nTrain = 279870, distribution = "laplace", interaction.depth = 4, n.trees = 1000)

which uses 1.1 GB of working set and 4.7 GB of committed memory. And very roughly, adding 2000 more trees uses an additional 2 GB of working set and 4-5 GB of committed memory. But when the distribution is changed "gaussian", the working set and committed memory stay around 550M, no matter how many trees.

I tried building the master branch, but gbm.fit() doesn't appear to work:

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1           nan             nan     0.0010       nan
     2           nan             nan     0.0010       nan
     3           nan             nan     0.0010       nan

(R version 3.0.2, gbm 2.1-0.3, Windows 7 64-bit.)

johnrolfeellis commented 10 years ago

Also, I can package up the training set and make it available privately if that would help.

harrysouthworth commented 9 years ago

This issue was moved to gbm-developers/gbm#16