Open njtierney opened 7 years ago
Perhaps this could include something like stat_function
where users suggest an imputation method and then pass args to it.
something like
ggplot(data = data,
aes(x = var1,
y = var2)) +
geom_imputed_point(fun = mice,
args = list(mice_options...))
ggplot(data = data,
aes(x = var1)) +
geom_imputed_density(fun = mice,
args = list(mice_options...))
Just an idea to keep track of
In this way, having geom_impute_*
would have similar options to geom_smooth()
- where you can specify method = "lm", "loess",
, etc.
However, I am not convinced that imputations should have the same treatment - you often want to use these values again.
So I think that imputing values should be a separate data tidying step.
There needs to be a clever way to keep track of the values that are imputed, without blowing out the size of the dataframe by storing the entire dataset twice (or m times, for multiple imputation). This is the idea behind the shadow matrix, but I wonder if there should be a better way to store this info in a nice print method, where users don't see some shadow vector/index that sits behind the data.
Need to collate all of these thoughts together.
imputed should be a stat, not a geom, if it is to be included with ggplot
Thanks Di!
So, this should be something along the lines of stat_impute
- which you can almost imagine existing for this example here.
Here, the code might look like the following:
set.seed(1492)
df <- data.frame(
x = rnorm(100)
)
df[sample(x = 100, size = 10),] <- NA
df
x <- df$x
base <- ggplot(df, aes(x)) + geom_density()
base + stat_impute(fun = impute_lm,
colour = "red",
args = list(x ~ .))
stat_impute doesn't work for me with the current naniar, but that is ok. I get the gist of it.
colour should be associated with geom_density, and it should be mapped to a variable indicating missing status.
This would be a new geom built for imputed data / imputed dataframes.
Not sure how the specifics of this would work, but something like:
Then this could display something similar to
geom_missing_point()
, but instead show the imputed values in addition to the regular data.This might use
shadow_bind
orshadow_augment
or similar to represent the imputations somehow.