scale_fill_manual() drains incidence curves of their color when groups not present

pbkeating commented 5 years ago

whatsapp image 2018-12-02 at 23 50 45

First time posting an issue, so happy to change as needed. I've been using the incidence plot to plot cases by date of onset against for example whether cases were confirmed/probable and this by health zone.

In some health zones, we have confirmed and probable cases and the plot is perfect. However, in other health zones, we only have confirmed cases and in this scenario, the plot is no longer an appropriate colour and the legend is incorrect (See attached image). I'm not sure if this is a ggplot issue or an incidence package issue.

pbkeating commented 5 years ago

20181206_195857

What the output looks like normally

zkamvar commented 5 years ago

Can you copy + paste the code you use to create the incidence object and the plot? There seems to be a lot of customisation here, and seeing the code may help me figure out what the issue is.

pbkeating commented 5 years ago

epicurve_overall_conprob_notif <- incidence(vhf_conprob$DateReport, 
                                            interval = 7, 
                                            group = vhf_conprob$EpiCaseDef)

# Plot and adjust as necessary
plot(epicurve_overall_conprob_notif, stack = TRUE, border = "black") +
  ggtitle("Courbe épidemique: cas MVE par definition de cas et\nsemaine de notification de cas") +
  aes(fill = factor(groups, unique(groups))) + # needed to make sure the factor order is respected
  theme_classic()+
  theme(
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 8, vjust = .5, angle = 90),
    axis.text.y = element_text(size = 8),
    axis.ticks.x = element_line(size = .15),
    legend.text = element_text(size = 8),
    legend.key.size = unit(1, 'lines'))+ 
    xlab("Semaine de notification") +
    ylab("Nombre de cas") +
  scale_x_date(date_breaks = "1 week", date_labels =  "%b %d") +
scale_fill_manual(name = "Statut du cas",
                      breaks = c("Confirmé", "Probable"),
                      labels = c("Confirmé", "Probable"), values = c("Probable" = "blue", "Confirmé" = "red"))

pbkeating commented 5 years ago

That would be the general code, but then I would apply that to subsets of the data, based on health zone and that is when I would encounter the problem with colour and legends. I'm thinking it's likely ggplot not able to handle that the pre-defined grouping in scale_fill_manual is not there.

zkamvar commented 5 years ago

What version of incidence do you have?

zkamvar commented 5 years ago

You can use packageVersion("incidence") to find out

pbkeating commented 5 years ago

1.5.2. Thanks for that. I ended up looking at package section next to help to find out.

zkamvar commented 5 years ago

One of the things that I notice here is that the erroneous plot has the fill labeled "a". This is a bit of a trick that we have to set a fill color. Looking at your code, I'm still not quite sure how "a" managed to show itself.

pbkeating commented 5 years ago

Maybe I should send the full code where the a appears? I used it as part of a loop.

zkamvar commented 5 years ago

1.5.2. Thanks for that. I ended up looking at package section next to help to find out.

Then you do not need this line, since grouping was fixed in 1.5.2:

aes(fill = factor(groups, unique(groups))) + # needed to make sure the factor order is respected

pbkeating commented 5 years ago

Awesome

zkamvar commented 5 years ago

Maybe I should send the full code where the a appears? I used it as part of a loop.

If it doesn't contain sensitive information, then yes, please send it.

pbkeating commented 5 years ago

# Create a function to make epicurves by Case definition and date of onset

epicurve_debut_symptomes <- function(data, y ){
epicurve_dds <- incidence(data, interval = 7, 
                                            group = y, na_as_group = FALSE)

# Plot and adjust as necessary
plot(epicurve_dds, stack = TRUE,  border = "black" ) +
  theme_classic()+
  theme(
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 8, vjust = .5, angle = 90),
    axis.text.y = element_text(size = 8),
    axis.ticks.x = element_line(size = .15),
    legend.title = element_blank(),
    legend.text = element_text(size = 8),
    legend.key.size = unit(1, 'lines'))+ 
     xlab("Semaine de début de symptômes")+
    ylab("Nombre de cas") +
  scale_x_date(date_breaks = "1 week", date_labels =  "%b %d")
}

# Identify all unique zone de santé to loop over
zs <- unique(vhf_conprob$SCRes)

# IDentify the variables of interest to create an epicurve by date of onset
variables <- c("EpiCaseDef", "HCW", "AgeGrp", "StatusAsOfCurrentDate")

# With this loop, we can tailor the ggplot for each specific variable (EpiCaseDef, HCW etc) using if else

for (i in zs) {
  for (j in variables){
  intermediate <- epicurve_debut_symptomes(data = vhf_conprob[vhf_conprob$SCRes ==i,"DateOnset"], y = vhf_conprob[vhf_conprob$SCRes == i , j])
  if ( j == "EpiCaseDef") {
    intermediate <- intermediate + 
      ggtitle(paste("Courbe épidemique: cas MVE par definition du cas et\nsemaine de début des symptômes et dans la zone de santé de", i)) +
      scale_fill_manual(name = "Statut du cas",
                       values = setNames(c("blue", "red"), c("Probable", "Confirmé")))
  } else if (j == "HCW") {
    intermediate <- intermediate +
      ggtitle(paste("Courbe épidemique: cas MVE par statut agent de santé et\nsemaine de début des symptômes et dans la zone de santé de", i)) +
      scale_fill_brewer(palette ="Paired")
  } else if (j == "AgeGrp") {
    intermediate <- intermediate +
      ggtitle(paste("Courbe épidemique: cas MVE par tranche d'âge et\nsemaine de début des symptômes et dans la zone de santé de", i)) +
      scale_fill_manual(breaks = c("<5ans", "5-14ans", "15-24ans", "25-34ans", "35-44ans", "45-59ans", "60+ans"),
                    values =setNames(c("#A6CEE3", "#1F78B4","#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C", "#FDBF6F", "#FF7F00"), c("<5ans", "5-14ans", "15-24ans", "25-34ans", "35-44ans", "45-59ans", "60+ans")))
  } else{
    intermediate <- intermediate +
      ggtitle(paste("Courbe épidemique: cas MVE par statut actuel et\nsemaine de début des symptômes et dans la zone de santé de", i)) +
      scale_fill_manual(name = "Statut actuel du cas",
                      breaks = c("Decede", "Vivant"),
                      labels = c("Décédé", "Vivant"), values = c("Decede" = "red", "Vivant" = "blue"))
  }
  print(intermediate)
}
}

pbkeating commented 5 years ago

That was a bit of a breakthrough loop for me and I'm sure it could be better done with one of the apply family, but those guys still cause me headaches

zkamvar commented 5 years ago

That was a bit of a breakthrough loop for me and I'm sure it could be better done with one of the apply family, but those guys still cause me headaches

Definitely understood! Sometimes you just have to get something working and then you polish it later :)

zkamvar commented 5 years ago

Also, you shouldn't have to specify scale_fill_manual(), you can set the colors by group by using the color argument in plot(): https://www.repidemicsconsortium.org/incidence/articles/customize_plot.html#example-2-changing-several-colors-note-that-naming-colors-is-optional

zkamvar commented 5 years ago

AHA! I found the culprit! Let me make a quick reprex and I'll report back

zkamvar commented 5 years ago

library("incidence")
library("ggplot2")
i <- incidence(rpois(100, 5))
plot(i, border = "black") + scale_fill_manual(name = "test", values = c("Probable" = "blue", "Confirmé" = "red"))
#> Scale for 'fill' is already present. Adding another scale for 'fill',
#> which will replace the existing scale.

^{Created on 2018-12-06 by the reprex package (v0.2.1)}

pbkeating commented 5 years ago

Awesome!

As a side note, I just tried one problematic health zone and when I defined colour within plot, it came back colourless. This is one of the health zones that only has confirmed cases.

# Create an epicurve by Case definition
epicurve_overall_conprob_debut <- incidence(vhf_conprob[vhf_conprob$SCRes == "Vuhovi", "DateOnset"],
                                            interval = 7, 
                                            group = vhf_conprob[vhf_conprob$SCRes == "Vuhovi", "EpiCaseDef"])

# Plot and adjust as necessary
plot(epicurve_overall_conprob_debut, stack = TRUE,  border = "black", 
     color = c(confirme = "red", probable = "blue")) +
  ggtitle("Courbe épidemique: cas MVE par definition de cas et\nsemaine de début des symptômes") +
  theme_classic()+
  theme(
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 8, vjust = .5, angle = 90),
    axis.text.y = element_text(size = 8),
    axis.ticks.x = element_line(size = .15),
    legend.text = element_text(size = 8),
    legend.key.size = unit(1, 'lines'))+ 
    xlab("Semaine de début des symptômes") +
    ylab("Nombre de cas") +
  scale_x_date(date_breaks = "1 week", date_labels =  "%b %d")

zkamvar commented 5 years ago

As a side note, I just tried one problematic health zone and when I defined colour within plot, it came back colourless. This is one of the health zones that only has confirmed cases.

It looks like you found a bug! I am a bit knackered now, but I'll have a go at a patch tomorrow morning.

pbkeating commented 5 years ago

Zhian, thanks a million for looking into this! Much appreciated :)

zkamvar commented 5 years ago

In essence, incidence counts the number of groups you have. If you have one group, then it thinks "gee, I really don't need a group name" and gives you a plot assuming you have no groups and adding the placeholder "a" instead of a group name. This allows the incidence fit curve to be seamlessly added. Of course, there are plenty of reasons why you would want to name a single group on a curve! In essence, what I'm trying to say is, the problem isn't your code:

zkamvar commented 5 years ago

Thanks for submitting the issue!

pbkeating commented 5 years ago

In essence, incidence counts the number of groups you have. If you have one group, then it thinks "gee, I really don't need a group name" and gives you a plot assuming you have no groups and adding the placeholder "a" instead of a group name. This allows the incidence fit curve to be seamlessly added. Of course, there are plenty of reasons why you would want to name a single group on a curve! In essence, what I'm trying to say is, the problem isn't your code:

So funny! Thanks for that and thanks for looking into it!

zkamvar commented 5 years ago

Hi @pbkeating, the latest version on github should fix your problem. You can install it with remotes:

remotes::install_github("reconhub/incidence@release")

pbkeating commented 5 years ago

Hi Zhian. Will try it out and let you know! Thanks a lot!

pbkeating commented 5 years ago

Hi Zhian. Just wanted to check if should uninstall the regular incidence package when using the above version? Also, do I still apply library to incidence above or use incidence@release:: or something to that effect?

pbkeating commented 5 years ago

Nevermind. I see that it's already been incorporated! Works perfectly! thanks a lot.

reconhub / incidence

scale_fill_manual() drains incidence curves of their color when groups not present #84