harrelfe / Hmisc

Harrell Miscellaneous
Other
208 stars 81 forks source link

Ecdf: curves colours plotted in a different order than specified #55

Open balwierz opened 7 years ago

balwierz commented 7 years ago

When there are multiple curves and group argument is used the colours might get rearranged. Steps to reproduce: library(Hmisc) reds <- rnorm(n=100, mean=5, sd=1) blues <- rnorm(n=100, mean=0, sd=1) Ecdf(x=c(reds, blues), group=c(rep("red", length(reds)), rep("blue", length(blues))), col=c("red", "blue")) And observe that the reds distribution is plotted in blue, and blues distribution is plotted in red.

This is because in ecdf.s group is converted to a factor. group <- as.factor(group) lev <- levels(group) nlev <- length(lev) Levels are not guaranteed to be in the order of the first occurrence. Now lev is in alphabetical order.

In the for loop over the nlev curves to be plotted the data is selected using the alphabetical order. In our case "blue" level is used first (i=1). s <- group == lev[i] x <- X[s] But the colours are used in the original order: lines(x, y, type="s", lty=lty[i], col=col[i], lwd=lwd[i]) In this case col[1] is still "red".

I consider it a serious bug. I have been presenting my research results based on Ecdf numerous times with no curve labelling...

harrelfe commented 7 years ago

The order of observations is not a reliable way to assign attributes. To get the behavior you want, make the group variable a factor and assign line attributes in order of the levels of that variable. If you want levels to be defined by the order of first appearance in the data (not a recommended programming practice), use something like g <- factor(x, levels=unique(x)).