proposal: add offset to Learner_Task, specified as the column name of offsets in the input data. The name of the offset column is stored in private$.nodes$offset. Need to settle on the default values for the offsets.
Currently, active members Learner_Task$weights and Learner_Task$id returns a vector of defaults, even if the user specified no weights / id.
Keep the default Learner_Task$id to be always a vector seq_len(nobs). These id's are most likely never going to be used for learner training, only for executing CV-schemes (not 100% sure).
Current default for Learner_Task$weights is a vector rep.int(1,nobs). While this seems to work for all currently used learners, it might be undesirable to always pass this to every single learner (i.e,. we may not want to provide default weights to all learners when the user is not anticipating that and haven't requested this explicitly). In general, always using default weights should never change the model fit, but it might have an effect on learner performance (unclear, but best to be safe). So there needs to be a clear mechanism for using or not using the weights within each learner:
proposal: add an argument use_weights = FALSE to all learners that are capable of using the weights. Default is to never use weights, unless specifically requested, even if the task included new, user-defined weights. Doesn't seem like a good option, adding too much complexity. @jeremyrcoyle thoughts?
alternative proposal: set default Learner_Task$weights to NULL and only use weights when !is.null(...). When some not NULL weights are neccesary as defaults, generate those on the fly as rep.int(1L,nobs). @jeremyrcoyle thoughts?
Setting neutral defaults for offset might create a lot of confusion. The neutral default offset may change with link function, depending on how these offset are used. For instance, for family="binomial" with GLM, the input offset will not be converted to logit-linear scale. If this conversion of the offset occurs inside the learner (e.g., TMLE learner) then the default offset of 0 would imply that the actual offset used is qlogis(0)=-Inf. This will work fine if all the learners assume that the offsets are already transformed to the scale of the link function. It will be hard to enforce and maintain this.
proposal: Set default offset=0. Add an argument use_offset = FALSE to all learners that are capable of using the offset. Default is to never use offset, unless specifically requested, even if the task included new, user-defined offset. Doesn't seem like a good option, too complex. @jeremyrcoyle thoughts?
alternative proposal: set default Learner_Task$offset to NULL and only use offset when !is.null(...). That way either the user is always responsible for providing interpretable offsets or the learner is responsible for generating correct offsets (on the right scale). @jeremyrcoyle thoughts?
proposal: add
offset
toLearner_Task
, specified as the column name of offsets in the input data. The name of the offset column is stored in private$.nodes$offset. Need to settle on the default values for the offsets.Currently, active members
Learner_Task$weights
andLearner_Task$id
returns a vector of defaults, even if the user specified no weights / id.Keep the default
Learner_Task$id
to be always a vectorseq_len(nobs)
. These id's are most likely never going to be used for learner training, only for executing CV-schemes (not 100% sure).Current default for
Learner_Task$weights
is a vectorrep.int(1,nobs)
. While this seems to work for all currently used learners, it might be undesirable to always pass this to every single learner (i.e,. we may not want to provide default weights to all learners when the user is not anticipating that and haven't requested this explicitly). In general, always using defaultweights
should never change the model fit, but it might have an effect on learner performance (unclear, but best to be safe). So there needs to be a clear mechanism for using or not using the weights within each learner:use_weights = FALSE
to all learners that are capable of using the weights. Default is to never use weights, unless specifically requested, even if the task included new, user-defined weights. Doesn't seem like a good option, adding too much complexity. @jeremyrcoyle thoughts?Learner_Task$weights
to NULL and only use weights when!is.null(...)
. When some notNULL
weights are neccesary as defaults, generate those on the fly asrep.int(1L,nobs)
. @jeremyrcoyle thoughts?Setting neutral defaults for
offset
might create a lot of confusion. The neutral default offset may change with link function, depending on how these offset are used. For instance, forfamily="binomial"
with GLM, the inputoffset
will not be converted to logit-linear scale. If this conversion of the offset occurs inside the learner (e.g., TMLE learner) then the default offset of 0 would imply that the actual offset used isqlogis(0)=-Inf
. This will work fine if all the learners assume that the offsets are already transformed to the scale of the link function. It will be hard to enforce and maintain this.offset=0
. Add an argumentuse_offset = FALSE
to all learners that are capable of using the offset. Default is to never useoffset
, unless specifically requested, even if the task included new, user-defined offset. Doesn't seem like a good option, too complex. @jeremyrcoyle thoughts?Learner_Task$offset
to NULL and only use offset when!is.null(...)
. That way either the user is always responsible for providing interpretable offsets or the learner is responsible for generating correct offsets (on the right scale). @jeremyrcoyle thoughts?