njtierney / naniar

Tidy data structures, summaries, and visualisations for missing data
http://naniar.njtierney.com/
Other
651 stars 53 forks source link

Bugfix geom_miss_point. Make lables factors #293

Closed maksymiuks closed 1 year ago

maksymiuks commented 2 years ago

Description

Fix error occurring when shape is passed in mapping. Basically when one tried to specify shape in aes passed to geom_miss_point (what technically should be possible) for some reason deafult_aes was used for colour that didn't have ..missing.. in env (nor parent env) resulting in error.

Additionally I introduced small change to how labels are constructed. Now they are factor instead of character so the Missing (and by extension Not Missing) will always have the same color regardless of input data. Previously have we split our data into two sub-data frames where one of them would have only complete cases, while second would also include NAs, then plot for the first sub-data frames appeared with different color mappings (on Missing and Not Missing) than second sub-data frame.

Related Issue

https://github.com/njtierney/naniar/issues/290

Example

Thanks to that code below will work

data <- DALEX::apartments
data <- missForest::prodNA(data)
data$no.rooms <- as.factor(data$no.rooms)

library(ggplot2)
library(naniar)

ggplot(data = data, mapping = aes(.data$m2.price, .data$surface)) +
  geom_miss_point()

ggplot(data = data, mapping = aes(.data$m2.price, .data$surface, color = .data$district)) +
  geom_miss_point()

ggplot(data = data, mapping = aes(.data$m2.price, .data$surface, shape = .data$district)) +
  geom_miss_point()

ggplot(data = data, mapping = aes(.data$m2.price, .data$surface, shape = .data$no.rooms, color = .data$district)) +
  geom_miss_point()

previously 3rd and 4th plot would not appear (error would be returned instead)

Tests

I made changes to code that is already covered by unit tests. Addiitioanlly related tests were updated.

NEWS + DESCRIPTION

Done

jonocarroll commented 2 years ago

Just to help with the review of this, here's my previous construction from #290 with these changes applied

library(naniar)
library(ggplot2)
mydat <- mtcars
mydat$mpg[1:10] = NA
ggplot(mydat, aes(x = mpg, y = disp, shape = factor(cyl))) +
  geom_miss_point(size = 4)

Created on 2021-11-30 by the reprex package (v2.0.1)

FYI, I'm not sure if you need/want to keep the default colour order, but with these changes the Missing points are blue and the Not Missing points are red. That's a consequence of making Missing be the last entry in the factor. With all Not Missing this is consistent

mydat <- mtcars
ggplot(mydat, aes(x = mpg, y = disp, shape = factor(cyl))) +
  geom_miss_point(size = 4)

but with all Missing the points are mis-labelled Not Missing

mydat <- mtcars
mydat$mpg = NA
ggplot(mydat, aes(x = mpg, y = disp, shape = factor(cyl))) +
  geom_miss_point(size = 4)

Created on 2021-11-30 by the reprex package (v2.0.1)

The reason here seems to be that passing all NA through GeomPoint results in a vector of non-missing values

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
attr(,"class")
[1] "mapped_discrete" "numeric"  

I can suggest an improvement to label_miss_2d of

# all missing
if (all(x1 == 1) && inherits(x1, "mapped_discrete")) x1[] <- NA
if (all(x2 == 1) && inherits(x2, "mapped_discrete")) x2[] <- NA
temp <- data.frame(x1, x2) %>% is.na %>% rowSums() %>% factor(levels = c("0", "1", "2"))
forcats::fct_recode(temp, "Not Missing" = "0", "Missing" = "1", "Missing" = "2")

but this still results in only a single legend entry (albeit the correct one, now).

In all of these examples, as well as the 3rd DALEX one, the shape legend is broken (symbols not rendered).

Let us know if there's anything we can do to help reach a solution.

maksymiuks commented 2 years ago

Thank you @jonocarroll so much for your help.

First of all I fixed broken legend for shape. For some reason default_aes NA for color was causing that. When I switched it to "black" it works just fine right now. Additionally that default coloring by missingness is handled by stat anyway, so that change to default_aes does not cause any regression.

Regarding label, good catch, I implemented your suggestions along with appropriate comment, and it works (the label is correct) although I'm not happy with the color. For some reason this Not Missing level is dropped even if we have labels as factors. This result in Missing using the color for Not Missing. This however only happens if whole x or y is full NA, which isn't necessarily very common.

That's why Not Missing is a first level as its way more common to have sub-data frames that have some missing and no missing at all (then colors are ok), than a sub-data frame with all NAs.

njtierney commented 2 years ago

Thanks so much for this! This is excellent. I'll take a quick look soon at this locally but I'm sure it's all good.

njtierney commented 1 year ago

Hi @jonocarroll and @maksymiuks - apologies this has taken me forever to look into this.

I'm a bit stuck on the fact that the colours are now swapped - I tried to change the factor order to the opposite, but this didn't work, but I'd really like to have this keep the colours the same as previous, I'm not quite sure how to manage this - do you have any thoughts?

jonocarroll commented 1 year ago

As far as I know (and that may be vastly out of date) if you want a colour to be stable you need to specify it with a manual scale. I'd suggest manually specifying colours for 'Missing' and 'Not Missing' regardless of the data, i.e. always having those legend entries present.

njtierney commented 1 year ago

OK I'll do that I reckon - that will also mean I can target https://github.com/njtierney/naniar/issues/148 at the same time, I've been meaning to set the colours to something colourblind safe!

njtierney commented 1 year ago

It is not as simple as I thought to set the colours manually via the custom geom - have you had experience doing this, @jonocarroll ?

jonocarroll commented 1 year ago

I haven't, sorry. I can try to reacquaint myself with this code sometime and see what I can do.

njtierney commented 1 year ago

Thanks! I'll try and have a stab as well, it's been a while since I've written a geom

njtierney commented 1 year ago

While I would prefer if the colours stayed the same, I think this solves an important issue so let's merge this once checks are done :)

njtierney commented 1 year ago

Resolves #290

njtierney commented 1 year ago

Aha! Figured it out - just needed to "revel" the factor levels