Theoretically, it should be sound to perform variable importance assessment based on a grid of counterfactual shift values with nominal variables; however, in practice, such variables (even when converted via as.numeric) have few unique values. This leads to a downstream bug due to sl3's Variable_Type where the nominal variables are categorized as categorical rather than continuous. This bug is non-trivial to track down and can be distressing to users. A simple but naive solution is to add mean-zero noise to nominal variables such that there appear to be more than 20 or so unique values, as this is sufficient to trick sl3 into recognizing the variable as continuous. For example, in the following variable u has only 4 (ordered) categories but will be recognized as categorical:
Theoretically, it should be sound to perform variable importance assessment based on a grid of counterfactual shift values with nominal variables; however, in practice, such variables (even when converted via
as.numeric
) have few unique values. This leads to a downstream bug due tosl3
'sVariable_Type
where the nominal variables are categorized as categorical rather than continuous. This bug is non-trivial to track down and can be distressing to users. A simple but naive solution is to add mean-zero noise to nominal variables such that there appear to be more than 20 or so unique values, as this is sufficient to tricksl3
into recognizing the variable as continuous. For example, in the following variableu
has only 4 (ordered) categories but will be recognized as categorical:To have it recognized as continuous, one could implement
which will have more categories than the original
u
yet remain the same in expectation.