drop unobserved factor levels

tklebel commented 8 years ago

Think about the following case:

test_df2 <- data.frame(
  gender = factor(c("male", "female")),
  smoke = factor(c(rep("yes", 5), rep("no", 5))),
  age = factor(c("young", "old"))
)
cross_table(test_df, smoke~gender)

# now we recode a level to be missing
test_df2$gender[test_df2$gender == "female"] <- NA

# females still show up
cross_table(test_df2, smoke~gender)

# levels should be removed too
test_df2$gender <- factor(test_df2$gender, levels = "male")
cross_table(test_df2, smoke~gender)

Should we simply do layout_column(drop = T) to drop all unobserved factor levels? Or should we let the user specify which levels to remove?

For the first case the computation could make use of tidyr::complete(model_frame).

gmodels::CrossTable seems to drop unobserved factor levels. For exploratory analysis this is not optimal: you should notice, if some combinations were not observed in the data.

Maybe layout_column() could gain the argument drop:

drop can be TRUE or FALSE with default TRUE.
alternatively you can provide a character vector, specifying the levels to drop. (if not all levels should be dropped)

tklebel commented 8 years ago

We shouldn't simply drop unobserved levels: For exploratory analysis it is important to be aware of missing categories.

tklebel commented 8 years ago

There is actually a function to drop unused levels: droplevels()

tklebel / crosstabr

drop unobserved factor levels #11