lishiwei2011 / gradientboostedmodels

Automatically exported from code.google.com/p/gradientboostedmodels
0 stars 0 forks source link

ModelMap fails CRAN checks #1

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Issue is to do with gbm.fit not returning train.fraction. A related issue has 
been described by Elisabeth Freeman:

Thank you for the windows binary.

I have started testing ModelMap with the new gbm package, and I think at least 
some of the issues are related to the change in function arguments names in 
gbm.fit() from ‘train.fraction’ to ‘nTrain’. ModelMap not only used the 
old argument name (which just generates a warning), but for prediction, it 
extracted the value of ‘train.fraction’ from the model object by name, and 
the new model objects created by gbm.fit() do not have a component by that 
name, resulting in an error.

I can update ModelMap to use nTrain, but  I also ran into an issue with the 
gbm.more() function. This function seems to still require that the model object 
have a component named ‘train.fraction’. For example, in these lines of 
code from the gbm.more function:

          num.groups.train <- max(1, round(object$train.fraction *nlevels(group)))

Model objects created by the new gbm() still have a train.fraction component, 
but objects created by the new gbm.fit() only have the ‘nTrain’ component. 
Since ModelMap is often used on large data sets, it uses the gbm.fit() function 
for model building. When I use gbm.more() on these models, I get the following 
result:

model.obj
gbm.more(object = SGB, n.new.trees = 100)
A gradient boosted model with gaussian loss function.
1200 iterations were performed.
Error in if (x$train.fraction < 1) { : argument is of length zero

Here is some sample code adapted from the gbm help files that shows the issue I 
am running into with using the gbm.more function on models fitted with 
gbm.fit():

################################################################################
########

N <- 1000
X1 <- runif(N)
X2 <- 2*runif(N)
X3 <- ordered(sample(letters[1:4],N,replace=TRUE),levels=letters[4:1])
X4 <- factor(sample(letters[1:6],N,replace=TRUE))
X5 <- factor(sample(letters[1:3],N,replace=TRUE))
X6 <- 3*runif(N)
mu <- c(-1,0,1,2)[as.numeric(X3)]

SNR <- 10 # signal-to-noise ratio
Y <- X1**1.5 + 2 * (X2**.5) + mu
sigma <- sqrt(var(Y)/SNR)
Y <- Y + rnorm(N,0,sigma)

# introduce some missing values
X1[sample(1:N,size=500)] <- NA
X4[sample(1:N,size=300)] <- NA

X<-data.frame(X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)

data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)

# fit initial model with gbm function
gbm1 <-
gbm(Y~X1+X2+X3+X4+X5+X6,         # formula
    data=data,                   # dataset

    var.monotone=c(0,0,0,0,0,0), # -1: monotone decrease,
                                 # +1: monotone increase,
                                 #  0: no monotone restrictions
    distribution="gaussian",     # see the help for other choices
    n.trees=1000,                # number of trees
    shrinkage=0.05,              # shrinkage or learning rate,
                                 #0.001 to 0.1 usually work
    interaction.depth=3,         # 1: additive model, 2: two-way interactions, etc.
    bag.fraction = 0.5,          # subsampling fraction, 0.5 is probably best
    train.fraction = 0.5,        # fraction of data for training,
                                 # first train.fraction*N used for training
    n.minobsinnode = 10,         # minimum total weight needed in each node
    keep.data=TRUE,              # keep a copy of the dataset with the object
    verbose=FALSE)               # don't print out progress

# fit initial model with gbm.fit function
gbm1fit <-
gbm.fit(x=X,y=Y,
    var.monotone=c(0,0,0,0,0,0), # -1: monotone decrease,
                                 # +1: monotone increase,
                                 #  0: no monotone restrictions
    distribution="gaussian",     # see the help for other choices
    n.trees=1000,                # number of trees
    shrinkage=0.05,              # shrinkage or learning rate,
                                 #0.001 to 0.1 usually work
    interaction.depth=3,         # 1: additive model, 2: two-way interactions, etc.
    bag.fraction = 0.5,          # subsampling fraction, 0.5 is probably best
    nTrain = nrow(X)*0.5,                # fraction of data for training,
                                 # first train.fraction*N used for training
    n.minobsinnode = 10,         # minimum total weight needed in each node
    keep.data=TRUE,              # keep a copy of the dataset with the object
    verbose=FALSE)               # don't print out progress

names(gbm1)
names(gbm1fit)

# do another 100 iterations
gbm2 <- gbm.more(gbm1,100,verbose=FALSE) # stop printing detailed progress
gbm2

# do another 100 iterations
gbm2fit <- gbm.more(gbm1fit,100,verbose=FALSE) # stop printing detailed progress
gbm2fit

#add train.fractiion to gbm1fit model object
gbm1fit<-gbm1fit
gbm1fit$train.fraction<-0.5
# do another 100 iterations
gbm2fit <- gbm.more(gbm1fit,100,verbose=FALSE) # stop printing detailed progress
gbm2fit

Original issue reported on code.google.com by harry.southworth on 9 Jan 2013 at 10:03

GoogleCodeExporter commented 8 years ago
According to the ModelMap maintainer, this is now fixed. Whist using ModelMap, 
she gets the warning about tra.fraciton being deprecated, but that doesn't 
affect the cran checks

Original comment by harry.southworth on 17 Jan 2013 at 4:50

GoogleCodeExporter commented 8 years ago

Original comment by harry.southworth on 21 Jan 2013 at 9:41