While the error that resulted in this PR was raised by auroc(), the issue stems from the predict() function.
Lack of more explicit warning against near zero variance features in block.splsda() will be address in separate PR.
For framework presented in reprex in associated GitHub Issue (here).
Take a given feature in one of the predictor blocks. If it's all 0s:
Centered and scaled in block.splsda(). Results in same all zero vector as center = 0, scale = 0.
object$X used as newdata parameter for predict() call in auroc()
within predict(), 0 values have 0 subtracted from them (centered) are divided by 0 (scaling), resulting in NaN in those predictor values. (In R, 0/0 == NaN)
NaNs can be handled safely (ignored) by the remainder of the function, resulting in valid predictions.
If that feature are all the same non-zero value (eg. all equal to 1):
Centered and scaled in block.splsda(). Results in all zero vector but center = 1, scale = 0. Stored in object$X
object$X used as newdata parameter for predict() call in auroc()
within predict(), 0 values have 1 subtracted from them (centered) are divided by 0 (scaling), resulting in Inf in those predictor values. (In R, 1/0 == Inf)
Infs CANNOT be handled safely by the remainder of the function, causing all predictions made on that block to be NaN. This results in downstream issues (like the error raised by `auroc() -> statauc() -> roc.default() -> roc.utils.perfs.fast.all.threshold() -> cut()
Hence, when the newdata parameter is centered and scaled using attributes of object$X, function now checks if any of the values are not finite. If so, then Inf or -Inf are replaced by NaN
While the error that resulted in this PR was raised by
auroc()
, the issue stems from thepredict()
function.Lack of more explicit warning against near zero variance features in
block.splsda()
will be address in separate PR.For framework presented in reprex in associated GitHub Issue (here).
Take a given feature in one of the predictor blocks. If it's all 0s:
block.splsda()
. Results in same all zero vector as center = 0, scale = 0.object$X
used asnewdata
parameter forpredict()
call inauroc()
predict()
, 0 values have 0 subtracted from them (centered) are divided by 0 (scaling), resulting inNaN
in those predictor values. (In R,0/0 == NaN
)NaN
s can be handled safely (ignored) by the remainder of the function, resulting in valid predictions.If that feature are all the same non-zero value (eg. all equal to 1):
block.splsda()
. Results in all zero vector but center = 1, scale = 0. Stored inobject$X
object$X
used asnewdata
parameter forpredict()
call inauroc()
predict()
, 0 values have 1 subtracted from them (centered) are divided by 0 (scaling), resulting inInf
in those predictor values. (In R,1/0 == Inf
)Inf
s CANNOT be handled safely by the remainder of the function, causing all predictions made on that block to beNaN
. This results in downstream issues (like the error raised by `auroc() -> statauc() -> roc.default() -> roc.utils.perfs.fast.all.threshold() -> cut()Hence, when the
newdata
parameter is centered and scaled using attributes ofobject$X
, function now checks if any of the values are not finite. If so, thenInf
or-Inf
are replaced byNaN