zellerlab / siamcat

R package for Statistical Inference of Associations between Microbial Communities And host phenoType
https://siamcat.embl.de/
52 stars 16 forks source link

train.model fails with any arguments to param.set #7

Closed akg2685 closed 4 years ago

akg2685 commented 4 years ago

I've tried to pass extra parameters for the mlr run to train.model via param.set, but it fails with the following error:

Error: Assertion failed. One of the following must apply:

I've tried it with an elastic net model setting the alpha and with a random forest model setting ntree and mtry. I've tried with several different reasonable values for these parameters and in all cases the error is the same.

jakob-wirbel commented 4 years ago

Could you provide some minimal code for a reproducible example? Maybe using the siamcat_example object?

akg2685 commented 4 years ago

I can reproduce it with the basic vignette data and code, except trying to use enet or randomForest with arguments instead of lasso:

library(SIAMCAT)

data("feat_crc_zeller", package="SIAMCAT") data("meta_crc_zeller", package="SIAMCAT")

label.crc.zeller <- create.label(meta=meta.crc.zeller, label='Group', case='CRC')

siamcat <- siamcat(feat=feat.crc.zeller, label=label.crc.zeller, meta=meta.crc.zeller)

siamcat <- filter.features(siamcat, filter.method = 'abundance', cutoff = 0.001)

siamcat <- normalize.features( siamcat, norm.method = "log.unit", norm.param = list( log.n0 = 1e-06, n.p = 2, norm.margin = 1 ) )

siamcat <- create.data.split( siamcat, num.folds = 5, num.resample = 2 )

Fails with the error message provided in the post above

siamcat <- train.model( siamcat, method = "enet", param.set = list(alpha = 0.5) )

Fails with the error message provided in the post above

siamcat <- train.model( siamcat, method = "randomForest", param.set = list(ntree = 500, mtry = floor(sqrt(nrow(filt_feat(siamcat)$filt.feat)))) )

Works fine

siamcat <- train.model( siamcat, method = "lasso" )

jakob-wirbel commented 4 years ago

Ah okay, fair enough, i see where the issue is. For the hyper-parameters that you want to set, mlr expects an upper and a lower bound, since the hyper-parameter will be explored in an internal cross-validation. Therefore, you need to supply two values in the param.set list.

For example,

siamcat <- train.model(siamcat, method = "enet", param.set = list(alpha = 0.5))

will not work, while

siamcat <- train.model(siamcat, method = "enet", param.set = list(alpha = c(0.1, 0.4)))

will work. The same is true for mtry and ntree for the randomForest and cost for the lasso_ll algorithm.

jakob-wirbel commented 4 years ago

But thanks for pointing that out, i guess it illustrates that we should update the documentation a bit...