metrumresearchgroup / yspec

Data Specification for Pharmacometrics
https://metrumresearchgroup.github.io/yspec
5 stars 2 forks source link

Fill in missing values when making factor #78

Closed kylebaron closed 2 years ago

kylebaron commented 2 years ago

Summary

When calling ys_add_factors(), look for missing values and fill in if requested

Reprex

library(tidyverse)
library(yspec)

spec <- ys_help$spec()
data <- ys_help$data()

data$RF[c(3,10,32)] <- NA_character_
count(data, RF)
#>     RF    n
#> 1 mild  360
#> 2  mod  360
#> 3 norm 3277
#> 4  sev  360
#> 5 <NA>    3

data <- ys_add_factors(data, spec, .missing = "Missing")

count(data, RF_f)
#>       RF_f    n
#> 1   Normal 3277
#> 2     Mild  360
#> 3 Moderate  360
#> 4   Severe  360
#> 5  Missing    3

Created on 2021-08-13 by the reprex package (v2.0.0)

KatherineKayMRG commented 2 years ago

Hey @kylebaron - This looks great! One thought - Can you make sure it always preserves the order factors are defined in the yspec and only adds the "Missing" to the end? It looks like you're already taking care of this but one thing I ran in to when I did this manually was that it rearranged the order of my factors.

kylebaron commented 2 years ago

@KatherineKayMRG One thing that I'm doing is refusing to process any column that is already a factor; when that column gets processed, it just returns a factor if it finds one coming in. Otherwise, we're working on non-factors and the order of the levels comes from the spec. Because of that, we just put "missing" as the last level when building the factor. That is, we get a chance to intervene before the factor order is set to make sure "missing" is last.

KatherineKayMRG commented 2 years ago

That's great and much nicer than how I had to do it manually. Thanks for explaining