Open balwierz opened 7 years ago
The order of observations is not a reliable way to assign attributes. To get the behavior you want, make the group variable a factor and assign line attributes in order of the levels of that variable. If you want levels to be defined by the order of first appearance in the data (not a recommended programming practice), use something like g <- factor(x, levels=unique(x)).
When there are multiple curves and
group
argument is used the colours might get rearranged. Steps to reproduce:library(Hmisc)
reds <- rnorm(n=100, mean=5, sd=1)
blues <- rnorm(n=100, mean=0, sd=1)
Ecdf(x=c(reds, blues), group=c(rep("red", length(reds)), rep("blue", length(blues))), col=c("red", "blue"))
And observe that thereds
distribution is plotted in blue, andblues
distribution is plotted in red.This is because in ecdf.s
group
is converted to a factor.group <- as.factor(group)
lev <- levels(group)
nlev <- length(lev)
Levels are not guaranteed to be in the order of the first occurrence. Nowlev
is in alphabetical order.In the for loop over the
nlev
curves to be plotted the data is selected using the alphabetical order. In our case "blue" level is used first (i=1).s <- group == lev[i]
x <- X[s]
But the colours are used in the original order:lines(x, y, type="s", lty=lty[i], col=col[i], lwd=lwd[i])
In this casecol[1]
is still "red".I consider it a serious bug. I have been presenting my research results based on Ecdf numerous times with no curve labelling...