lishiwei2011 / gradientboostedmodels

Automatically exported from code.google.com/p/gradientboostedmodels
0 stars 0 forks source link

Error in interact.gbm with multinomial family #14

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
It seems that interact.gbm does not work for family=multinomial

> data(iris)

> set.seed(10)

> f<-gbm(Species~.,data=iris,n.trees=1000,interaction.depth=2)

Distribution not specified, assuming multinomial ...

> interact.gbm(f,data=iris,i.var=c(1,2))

Error in weighted.mean.default(f, n) : 
  'x' and 'w' must have the same length

Works fine with the bernoulli family

> set.seed(10)
> f1<-gbm(I(Species=="setosa")~.,data=iris,n.trees=1000,interaction.depth=2)

Distribution not specified, assuming bernoulli ...

> interact.gbm(f1,data=iris,i.var=c(1,2))

[1] 0.8861943

> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=sv_SE.UTF-8       LC_NUMERIC=C               LC_TIME=sv_SE.UTF-8        LC_COLLATE=sv_SE.UTF-8    
 [5] LC_MONETARY=sv_SE.UTF-8    LC_MESSAGES=sv_SE.UTF-8    LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  
methods   base     

other attached packages:
[1] Hmisc_3.10-1     gbm_2.0-9.5      lattice_0.20-6   survival_2.36-14

loaded via a namespace (and not attached):
[1] cluster_1.14.3 grid_2.15.3    tools_2.15.3  

Original issue reported on code.google.com by erik.la...@gmail.com on 25 Mar 2013 at 4:46

GoogleCodeExporter commented 8 years ago
Reproduced and accepted. Builds after 2-9.5 fail even with Bernoulli

Original comment by harry.southworth on 11 Apr 2013 at 3:20

GoogleCodeExporter commented 8 years ago
I've encountered various issues here.

I agree that with dist="mulitnomial", it's a bug. I'll fix it.

Otherwise, I sometimes get values > 1, which ought to be impossible. At other 
times I get NaNs. Sometimes, the values are highly unstable and vary widely 
from run to run.

I think there are reasonable explanations.

If a feature never makes it into the model (has relative influence 0), the top 
and bottom of H are zero, so we get 0/0 = NaN.

If a feature has relative influence close to zero, a combination of rounding 
errors and trying to compute tiny values result in values > 1.

With simulated data with real interactions, I get answers that appear to be 
reasonable.

I'd be grateful to anyone else who investigates this and provides further 
insight.

Harry

Original comment by harry.southworth on 19 Apr 2013 at 3:41

GoogleCodeExporter commented 8 years ago
I'v fixed the multinomial bug, added some comments to the help file, and got 
the function to replace H > 1 with NaN.

Original comment by harry.southworth on 10 May 2013 at 1:12