tidymodels / yardstick

Tidy methods for measuring model performance
https://yardstick.tidymodels.org/
Other
368 stars 54 forks source link

weighted normalized gini #442

Open SimonCoulombe opened 1 year ago

SimonCoulombe commented 1 year ago

It would be awesome to add weighted normalized gini to the set of available metrics for regression. This is useful in insurance when we want to evaluate to performance of a loss cost model. We order the predictions by the predicted "annualized loss cost", but weigh them by the exposure (time the policy actually lasted) to get the actual dollar amount.

It is discussed (with code) in the following kaggle on fire peril loss cost https://www.kaggle.com/c/liberty-mutual-fire-peril/discussion/9880

Here is some code I use outside tidymodels. It is inspired by the function posted by pimin the kaggle thread. I think he had inverted the sign in the weighted gini, which meant a perfect prediction would get a gini of -0.999 instead of 0.999.

the formula is derived from this 2015 blog post: http://blog.nguyenvq.com/blog/2015/09/25/calculate-the-weighted-gini-coefficient-or-auc-in-r/


#' Title
#'
#' @param actual #  actual  loss cost
#' @param predicted ## predicted loss cost
#' @param weights ## earned exposure
#'
#' @return
#' @export
#'
#' @examples
weighted_gini <- function(actual, predicted, weights) {
  df <- data.frame(actual, weights, predicted)
  n <- nrow(df)
  df <- df[order(df$predicted, decreasing = TRUE), ]
  df$cum_weight <- cumsum(df$weights / sum(df$weights))
  df$cum_pos_found <- cumsum(df$actual * df$weights) 
  df$Lorentz <- df$cum_pos_found / df$cum_pos_found[n]
  sum(df$Lorentz[-n] * df$cum_weight[-1]) - sum(df$Lorentz[-1] * df$cum_weight[-n])
}

#' Title
#'
#' @param actual # actual loss cost
#' @param predicted # predicted loss cost
#' @param weights # earned exposure
#'
#' @return
#' @export
#'
#' @examples
normalized_weighted_gini <- function(actual, predicted, weights) {
  weighted_gini(actual, predicted, weights) / weighted_gini(actual, actual, weights)
}
simonpcouch commented 10 months ago

Related to #147.