grantmcdermott / tinyplot

Lightweight extension of the base R graphics system
https://grantmcdermott.com/tinyplot
Apache License 2.0
208 stars 7 forks source link

facet formulas don't play nice with type = "density" #103

Closed grantmcdermott closed 5 months ago

grantmcdermott commented 5 months ago
library(plot2)

aq = airquality
aq$hot = ifelse(aq$Temp>=75, "hot", "cold")
aq$windy = ifelse(aq$Wind>=15, "windy", "calm")

Combining the atomic plot2.default method with a facet formula doesn't work at all.

with(
  aq,
  plot2(
    Ozone, 
    type = "density",
    facet = windy ~ hot,
    fill = "by",
    grid = TRUE, frame = FALSE, 
    main = "Ozone pollution is worse on hot, calm days"
  )
)
#> Error in density.default(x): 'x' contains missing values

Retrying with plot2.formula kinda works, but not actually because the splits aren't correct. (There are missing data mismatches.) So the distributions aren't actually correct. Furthermore, the order of facets is wrong (it's no longer a facet grid) and the titles haven't been parsed correctly.

plot2(
  ~ Ozone,
  data = aq, 
  type = "density",
  facet = windy ~ hot,
  fill = "by",
  # the rest of these arguments are optional...
  grid = TRUE, frame = FALSE, 
  main = "Ozone pollution is worse on hot, calm days"
)
#> Warning in split.default(x, f = facet): data length is not a multiple of split
#> variable

Subsetting ahead of time actually "works", insofar as it produces the right plots... but again the facet grid layout has not been parsed correctly.

plot2(
  ~ Ozone,
  data = subset(aq, !is.na(Ozone)), 
  type = "density",
  facet = windy ~ hot,
  fill = "by",
  # the rest of these arguments are optional...
  grid = TRUE, frame = FALSE, 
  main = "Ozone pollution is worse on hot, calm days"
)

Created on 2024-01-25 with reprex v2.0.2

In summary: There are a couple of things happening here. One is that the na.action / na.omit steps aren't equal, e.g. when constructing model frames from formulae. So we get mismatches if either the main (x/y/formula) or facet variables have missing observations. Another is that the grid layout has been lost along the way, likely because of splitting/recombining the facet variables in the plot2.density logic (which is unavoidable, but causes them to lose some attributes).