harrysouthworth / gbm

Gradient boosted models
Other
106 stars 27 forks source link

Encounter NaN depending on number of training observations and setting for shrinkage parameter in gbm.fit() #47

Closed jimthompson5802 closed 9 years ago

jimthompson5802 commented 9 years ago

Depending on the number of observations used for training and setting of shrinkage parameter, I encounter the following problem in gbm.fit()

> mdl <- gbm.fit(x=train.df[,1:(ncol(train.df)-1)],
+                 y=train.df[,ncol(train.df)],
+                 distribution = "multinomial",
+   .... [TRUNCATED] 
Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        2.1972             nan     0.1000    0.6183
     2        1.8424             nan     0.1000    0.3274
     3        1.6543             nan     0.1000    0.2367
     4        1.5193             nan     0.1000    0.1802
     5        1.4185             nan     0.1000    0.1462
     6        1.3352             nan     0.1000    0.1213
     7        1.2657             nan     0.1000    0.0945
     8        1.2104             nan     0.1000    0.0811
     9        1.1629             nan     0.1000    0.0742
    10        1.1200             nan     0.1000    0.0609
    20        0.8972             nan     0.1000    0.0206
    40        0.7478             nan     0.1000    0.0040
    60        0.6810             nan     0.1000    0.0014
    80        0.6490             nan     0.1000       nan
   100           nan             nan     0.1000       nan
   120           nan             nan     0.1000       nan
   140           nan             nan     0.1000       nan
   150           nan             nan     0.1000       nan

I have source code and data in a zip file that can reproduce this problem. How can I provide this?

az0 commented 9 years ago

GitHub does not allow attaching these to issues. For source, I often use Gist and post a link. For zip and general files, you could post them on your favorite file hosting service (e.g., Dropbox, Google Drive) and post a link.

jimthompson5802 commented 9 years ago

az0...thank you for the guidance.

Here is the link to Google Drive for the zip file https://drive.google.com/open?id=0B95myaZR5glcfm5XaXM3aGYyRkZQbG9hN1BBdElaNUVwcXVsZDdyN2tpMzYxeV94LWxyY28&authuser=0

pdmetcalfe commented 9 years ago

Hi - thanks for reporting this and providing an example. There's both good news and bad news!

The good news is that gbm development is now going on a bit more actively @gbm-developers - please carry on the conversation over there!

The bad news is that multinomial currently has all sorts of errors that we are trying to shake out, including segmentation faults.

jimthompson5802 commented 9 years ago

@pdmetcalfe...are you suggesting that I repost my problem description @gbm-developers?

pdmetcalfe commented 9 years ago

Yup.

On 7 April 2015 at 21:16, Jim Thompson notifications@github.com wrote:

@pdmetcalfe https://github.com/pdmetcalfe...are you suggesting that I repost my problem description @gbm-developers https://github.com/gbm-developers?

— Reply to this email directly or view it on GitHub https://github.com/harrysouthworth/gbm/issues/47#issuecomment-90718130.

pdm