Closed njtierney closed 1 year ago
This approach includes a .integer method, which just uses round
on the number, perhaps not the best approach, but pretty good?
library(naniar)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
vec <- rnorm(10)
vec[sample(1:10, 3)] <- NA
impute_mode(vec)
#> [1] -0.8667728 -1.3941443 -1.2821929 -1.1371585 -0.7508977 -1.0704500
#> [7] -1.1450668 -1.1450668 1.1446806 -1.1450668
dat <- tibble(
num = rnorm(10),
int = as.integer(rpois(10, 5)),
fct = factor(LETTERS[1:10])
) %>%
mutate(
across(
everything(),
\(x) set_prop_miss(x, prop = 0.25)
)
)
dat
#> # A tibble: 10 × 3
#> num int fct
#> <dbl> <int> <fct>
#> 1 NA 6 A
#> 2 NA NA B
#> 3 0.364 7 C
#> 4 -1.22 4 D
#> 5 0.0346 3 <NA>
#> 6 0.0860 5 F
#> 7 -0.486 4 <NA>
#> 8 -0.930 5 H
#> 9 0.932 NA I
#> 10 -0.946 5 J
dat %>%
nabular() %>%
mutate(
num = impute_mode(num),
int = impute_mode(int),
fct = impute_mode(fct)
)
#> # A tibble: 10 × 6
#> num int fct num_NA int_NA fct_NA
#> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 -0.787 6 A NA !NA !NA
#> 2 -0.787 5 B NA NA !NA
#> 3 0.364 7 C !NA !NA !NA
#> 4 -1.22 4 D !NA !NA !NA
#> 5 0.0346 3 A !NA !NA NA
#> 6 0.0860 5 F !NA !NA !NA
#> 7 -0.486 4 A !NA !NA NA
#> 8 -0.930 5 H !NA !NA !NA
#> 9 0.932 5 I !NA NA !NA
#> 10 -0.946 5 J !NA !NA !NA
Created on 2023-04-10 with reprex v2.0.2
Current branch: https://github.com/njtierney/naniar/tree/impute-mode
Uses suggestions from here: https://stackoverflow.com/questions/2547402/is-there-a-built-in-function-for-finding-the-mode
Personally I think that it would be best if the new mode function contained an option for different estimations of the mode. Using the density is neat, but doesn't work in cases where you have only integers