Open zachmayer opened 8 years ago
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin14.5.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.8.3 gbm_2.1.1 survival_2.38-3 caret_6.0-58 ggplot2_1.0.1 lattice_0.20-33
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 magrittr_1.5 MASS_7.3-44 munsell_0.4.2 colorspace_1.2-6 foreach_1.4.3 minqa_1.2.4
[8] stringr_1.0.0 car_2.1-0 tools_3.2.2 nnet_7.3-11 pbkrtest_0.4-2 grid_3.2.2 gtable_0.1.2
[15] nlme_3.1-122 mgcv_1.8-8 quantreg_5.19 MatrixModels_0.4-1 iterators_1.0.8 lme4_1.1-10 digest_0.6.8
[22] Matrix_1.2-2 nloptr_1.0.4 reshape2_1.4.1 codetools_0.2-14 stringi_1.0-1 compiler_3.2.2 scales_0.3.0
[29] stats4_3.2.2 SparseM_1.7 proto_0.3-10
I think that it is fixed now. Please test. Also:
Thanks! I'll check it out.
Did this work?
I installed master with: devtools::install_github('topepo/caret/pkg/caret@master')
and re-ran:
library(caret)
library(gbm)
data(iris)
X <- iris[,2:4]
Y <- iris[,1]
gbmFit1 <- train(
X, Y,
method = "gbm", verbose=FALSE,
distribution = list(name="quantile",alpha=0.25),
trControl = trainControl(method = "cv")
)
But I still got an error:
Error in { :
task 1 failed - "arguments imply differing number of rows: 3, 0"
I think the problem is that I'm providing distribution
as a list: list(name="quantile",alpha=0.25)
, rather than a character variable: quantile
.
This will also be a problem for pairwise metrics, e.g. distribution=list(name="pairwise",group=iris$Species,metric='mrr')
Interesting. It works if you specify trainControl(method = 'none')
, but fails if you specify trainControl(method = 'cv', number=5)
.
I tried all the GBM distributions, with interesting results:
set.seed(1)
library(caret)
library(gbm)
dat <- twoClassSim()
X <- dat[,1:15]
Y <- as.integer(dat[,16]) - 1
ctrl <- trainControl(method = 'cv', number=5)
Working:
train(
X, Y, method='gbm', distribution='gaussian', verbose=FALSE,
trControl=ctrl, tuneLength=1
)
train(
X, Y, method='gbm', distribution='laplace', verbose=FALSE,
trControl=ctrl, tuneLength=1
)
train(
X, Y, method='gbm', distribution='tdist', verbose=FALSE,
trControl=ctrl, tuneLength=1
)
train(
X, Y, method='gbm', distribution='poisson', verbose=FALSE,
trControl=ctrl, tuneLength=1
)
train(
X, factor(Y), method='gbm', distribution='bernoulli', verbose=FALSE,
trControl=ctrl, tuneLength=1
)
train(
X, factor(Y), method='gbm', distribution='huberized', verbose=FALSE,
trControl=ctrl, tuneLength=1
)
train(
X, factor(Y), method='gbm', distribution='adaboost', verbose=FALSE,
trControl=ctrl, tuneLength=1
)
train(
X, Y, method='gbm', distribution=list(name="tdist", df=8), verbose=FALSE,
trControl=ctrl, tuneLength=1
)
Not working:
train(
X, Y, method='gbm', distribution=list(name="quantile",alpha=0.25), verbose=FALSE,
trControl=ctrl, tuneLength=1
)
train(
X, Y, method='gbm', distribution=list(name="pairwise", group=1, metric='mrr'), verbose=FALSE,
trControl=ctrl, tuneLength=1
)
train(
X, Surv(Y), method='gbm', distribution='coxph', verbose=FALSE,
trControl=ctrl, tuneLength=1
)
So quantile, pairwise, and survival models don't work at the moment.
FYI, here's the gbm.fit code for all of the above models:
gbm.fit(X, Y, distribution='gaussian', verbose=FALSE)
gbm.fit(X, Y, distribution='laplace', verbose=FALSE)
gbm.fit(X, Y, distribution='tdist', verbose=FALSE)
gbm.fit(X, Y, distribution='poisson', verbose=FALSE)
gbm.fit(X, factor(Y), distribution='bernoulli', verbose=FALSE)
gbm.fit(X, factor(Y), distribution='huberized', verbose=FALSE)
gbm.fit(X, factor(Y), distribution='adaboost', verbose=FALSE)
gbm.fit(X, Y, distribution=list(name="tdist", df=8), verbose=FALSE)
gbm.fit(X, Y, distribution=list(name="quantile",alpha=0.25), verbose=FALSE)
gbm.fit(X, Y, distribution=list(name="pairwise", group=1, metric='mrr'), verbose=FALSE)
gbm.fit(X, Surv(Y), distribution='coxph', verbose=FALSE)
You can see they produce models
Was this ever resolved? I am still receiving a similar error when using certain distributions and gbm. This will work:
devtools::install_github("gbm-developers/gbm")
fit1 <- gbm.fit(x,y,distribution="gamma")
but this returns an error:
library(caret)
fit2 <- train(x,y, method='gbm', distribution='gamma', trControl=ctrl, tuneLength=1)
task 1 failed - "arguments imply differing number of rows: 3, 0"
The last model will run fine if the distribution is changed to 'gaussian'.
I think maybe caret isn't properly handling the predictions coming from the quantile regression GBM, but am not sure.