mayer79 / flashlight

Machine learning explanations
https://mayer79.github.io/flashlight/
GNU General Public License v2.0
22 stars 4 forks source link

Werid behaviour of ale and pdp plots #36

Closed maksymiuks closed 4 years ago

maksymiuks commented 4 years ago

Hi,

First of all, I'd like to share my amazement with that package!

However, during my research work, I've encountered a weird behavior of function that generates variable profiles. Let me show it

data(titanic_imputed, package = "DALEX")
ranger_model <- ranger::ranger(survived~., data = titanic_imputed, classification = TRUE, probability = TRUE)

custom_predict <- function(X.model, new_data) {
  predict(X.model, new_data)$predictions[,1]
}
fl <- flashlight(model = ranger_model, data = titanic_imputed, y = "survived", label = "Titanic Ranger",
                 metrics = list(auc = AUC), predict_function = custom_predict)

ale <- light_profile(fl, v = "fare", type = "ale")
plot(ale)

Here we see rather correct ALE plot for provided data. The general direction in those data should be the more particular passenger had paid, it's more possible he survived. However the plot has been creating using the probability of 0 class, this will be important. Now let's create pdp plot

custom_predict <- function(X.model, new_data) {
  predict(X.model, new_data)$predictions[,1]
}
fl <- flashlight(model = ranger_model, data = titanic_imputed, y = "survived", label = "Titanic Ranger",
                 metrics = list(auc = AUC), predict_function = custom_predict)

pdp <- light_profile(fl, v = "fare", type = "partial dependence")
plot(pdp)

Using the same column, which contains the probability of belonging to 0 class, we get an inversed pdp plot, it shows that probability decreases along with fare value increase. To get a plot that seems proper I had to swap columns in custom_predict so it indicates the probability of belonging to 1 class.

custom_predict <- function(X.model, new_data) {
  predict(X.model, new_data)$predictions[,2]
}
fl <- flashlight(model = ranger_model, data = titanic_imputed, y = "survived", label = "Titanic Ranger",
                 metrics = list(auc = AUC), predict_function = custom_predict)

pdp <- light_profile(fl, v = "fare", type = "partial dependence")
plot(pdp)

Overall it looks like one of the function inverses the probabilities. Is it intended?

Best regards Szymon Maksymiuk

mayer79 commented 4 years ago

Hey Szymon

Thanks for the feedback. It would be indeed wierd if the response needs a swap here!

Check 1

Ranger has an encoding issue with character (non-factor) columns, see https://github.com/imbs-hl/ranger/issues/502

Since your model only uses factors, this cannot be the reason.

Check 2

Is predict working properly?

library(flashlight)
library(MetricsWeighted)
library(ranger)

set.seed(1)
data(titanic_imputed, package = "DALEX")

ranger_model <- ranger(survived~., 
                       data = titanic_imputed, 
                       classification = TRUE, 
                       probability = TRUE)

custom_predict <- function(X.model, new_data) {
  predict(X.model, new_data)$predictions[, 2]
}

fl <- flashlight(model = ranger_model, 
                 data = titanic_imputed, y = "survived", label = "Titanic Ranger",
                 metrics = list(auc = AUC), 
                 predict_function = custom_predict)

# Use predict method of flashlight
predict(fl, data=head(titanic_imputed))
0.09135881 0.27651991 0.11710044 0.59006178 0.72432168 0.23170322

# Use predict method of ranger
predict(ranger_model, head(titanic_imputed))$predictions[, 2]
0.09135881 0.27651991 0.11710044 0.59006178 0.72432168 0.23170322

Looks good to me.

Check 3

How does the distribution of the predictor looks like?

hist(titanic_imputed$fare)

image

Now, it looks as if we have identified the problem: Very skewed distribution, so most evaluation points of ALE use only very few observations!

Proposed solution

Select evaluation points in the dense part of the covariable.

evaluate_at <- 0:100

pdp <- light_profile(fl, v = "fare", pd_evaluate_at = evaluate_at)
plot(pdp)

ale <- light_profile(fl, v = "fare", type = "ale", pd_evaluate_at = evaluate_at)
plot(ale)

PDP

image

ALE

image

Now, there is some similarity across method. The differences are probably coming from correlation with parch, and class:

boxplot(fare~class, data = titanic_imputed)

image

boxplot(fare~parch, data = titanic_imputed)

image

So I'd actually expect differences between PDP and ALE as the "everything else being fixed" logic behind PDP is not realistic.

maksymiuks commented 4 years ago

Thank You for an extensive response. I really appreciate it. Indeed look like the problem is in skewed distribution and me using default parameters.

Once again thanks :)