topepo / C5.0

An R package for fitting Quinlan's C5.0 classification model
https://topepo.github.io/C5.0/
50 stars 20 forks source link

When plotting a C50 tree with spaces in column names, there is an error #9

Closed rakeshnbabu closed 6 years ago

rakeshnbabu commented 7 years ago

Hello, please see the code below reproducing an error in R version 3.4.1 (2017-06-30) x86_64-w64-mingw32 on Windows 7:

data(iris)
library(C50)
names(iris) <- c("Sepal Length", "Sepal Width", "Petal Length", "Petal Width", "Species")
myTree <- C5.0(Species ~ `Sepal Length` + `Sepal Width` + `Petal Length` + `Petal Width`, data=iris)
plot(myTree)

Error in if (!n.cat[i]) { : argument is of length zero

The issue appears to be with column names containing spaces.

The error is thrown by line 3 of c5.split, but unfortunately I'm unable to determine a root cause with any more granularity. I've included a traceback below:

7: c5.split(vvec[ind[1]], bvec[ind], TRUE)
6: c5.node(treestr, indvars, cuts, vars)
5: as.partynode(c5.node(treestr, indvars, cuts, vars), from = 1L)
4: as.party.C5.0(x, trial = trial)
3: as.party(x, trial = trial)
2: plot.C5.0(myTree)
1: plot(myTree)
topepo commented 6 years ago

This should be fixed in the github version (0.1.1.9000) if you would like to test.