wilkelab / ggridges

Ridgeline plots in ggplot2
https://wilkelab.org/ggridges
GNU General Public License v2.0
411 stars 31 forks source link

point_color scale when there is missing factors #10

Open smouksassi opened 6 years ago

smouksassi commented 6 years ago

It seems that there is some interaction between the color/fill and point_color scale when there is missing factors. consider this example:

iristest<- iris
iristest$Sepal.Length <- ifelse(iristest$Species=="setosa", NA, iristest$Sepal.Length)
iristestlinerange<- iristest%>%
  dplyr::group_by(Species)%>%
 dplyr::summarize(xmin=min(Sepal.Length),xmax=max(Sepal.Length))
ggplot(iristest,aes(x=Sepal.Length,y=Species,fill=Species))+
  stat_density_ridges(aes(point_color=Species),alpha=0.2,jittered_points=TRUE)+
  geom_segment(data=iristestlinerange,size=2,alpha=0.4,
                 aes(x=xmin,xend=xmax,y=Species,yend=Species,
                     col= Species,group=Species))

plotbug

Why does the fill of the density know that there is a missing factor and start with color 2 and 3 while the point_color scale behaves as if there is only two colors? ( of course we can override using a scale_discrete_manual but I wanted to understand the intended design and the differences between the fill scale and point_col scale.

clauswilke commented 6 years ago

Yeah, not sure, honestly. ggplot2 sometimes does unexpected things when you have factors where an entire level has no data, and that seems to be the case here. The points are actually behaving correctly, and the densities are not. This is all auto-decided by ggplot2, so I'd have to do some digging to figure out what's going on. Importantly, the mechanism I use to plot points and densities at the same time is somewhat hacky, so there may be strange things happening on occasion.

Do you have a real-world problem where this actually would be an issue? It seems to me that for this plot you should remove setosa from the data frame entirely, and then the problem is gone.

smouksassi commented 6 years ago

Hi Claus, Thank you for getting back to me so quickly and thanks for ggridges it is an awesome useful package that enabled us to do very cool plots within the ggplot framework.

My real life problems look like the following plot:

exposureresponse4a

I had to add this code to fix the mismatch in colors

plotwithwrongcolorpoints+
  scale_discrete_manual(aesthetics="point_color",
                        values= tableau_color_pal('tableau10')(5)[2:5]) 

exposureresponse4

Basically, we have a drug that is given at the following dose levels Placebo (0 mg) up to 2400 mg. While a placebo drug has a response all AUC are by definition zeros. Hence the blue color in plot showing the drug action (probability of response ) the researcher did not want me to have a distribution of it at zero. So my hack was to make the AUC for placebo all NA's and that is how I found out this behavior of the points of the density. I am not sure if there is a better way to suppress density for a given group. ( another feature of this plot is that I compute how much percent of distribution crosses the vertical lines of interest but not relevant to this issue and might be nice feature in the future)