Currently, if the response contains an NA, a clear error message is thrown:
data <- data.frame(x = rnorm(50), y = c(rnorm(49), NA))
m <- xrf(y ~x, data, family = 'gaussian', xgb_control = list(nrounds=1, max_depth=2))
Error in xrf_preconditions(family, xgb_control, glm_control, data, response_var, :
Response variable contains missing values which is not allowed
However, if any predictor contains an NA, the *model.matrix implementation will silently drop the row, which results in confusing errors:
data <- data.frame(y = rnorm(50), x = c(rnorm(49), NA))
m <- xrf(y ~x, data, family = 'gaussian', xgb_control = list(nrounds=1, max_depth=2))
Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) :
The length of labels must equal to the number of rows in the input data
Several fixes may make sense:
Fail fast & clearly with a preconditions check
Offer several (configurable) remediation methods, like dropping offending rows or mean/mode imputation.
Currently, if the response contains an NA, a clear error message is thrown:
However, if any predictor contains an NA, the *model.matrix implementation will silently drop the row, which results in confusing errors:
Several fixes may make sense: