smouksassi / ggquickeda

ggplot2 and table1 summary statistics quick exploration of data
https://smouksassi.github.io/ggquickeda/
Other
71 stars 8 forks source link

[Feature request] Yeo-Johnson transformation for numeric variables #24

Open lucazav opened 3 years ago

lucazav commented 3 years ago

It'd be really useful to be able to apply the Yeo-Johnson transformation to numeric variables instead of only the "log10" one. In this way you can manage also left skewed distributions.

lucazav commented 3 years ago

This one could be the function to implement:

library(dplyr)

yeo_johnson_transf <- function(data) {
  require(recipes)

  rec <- recipe(data, as.formula(' ~ .'))

  rec <- rec %>%
    step_center( all_numeric() ) %>%
    step_scale( all_numeric() ) %>%
    step_YeoJohnson( all_numeric() )

  prep_rec <- prep( rec, training = data )

  res_list <- list( df_yeojohnson = bake( prep_rec, data ),
                    lambdas = prep_rec$steps[[3]][["lambdas"]] )
}

yeo_johnson_list <- iris %>% 
  yeo_johnson_transf()

transf_iris <- yeo_johnson_list$df_yeojohnson
transf_iris

lambdas_iris <- yeo_johnson_list$lambdas
lambdas_iris
smouksassi commented 3 years ago

Thanks for your interest in ggquickeda, it seems a useful feature, do you have an idea on where in the workflow you would like me to implement it ? as a transformation of the variable itself or of the ggplot scale ? I think I have a menu where I allow dividing a numeric variable by a constant or another column I can fit it in there.

lucazav commented 3 years ago

I think it'd be great to have it as a transformation of the variable itself.