topepo / C5.0

An R package for fitting Quinlan's C5.0 classification model
https://topepo.github.io/C5.0/
50 stars 20 forks source link

Error in FUN(X[[i]], ...) : 'list' object cannot be coerced to type 'double' when using plot() #29

Open KyrMitsos opened 3 years ago

KyrMitsos commented 3 years ago

Hi guys,

I have a dataframe of 4 predictors and one response variable. I am using your examples on C50 library and its very useful plot function however I get this message when I run plot on the learned model

fit <- C5.0(y=mydata$RES_B, x=mydata[,-5], trials=1) plot(fit) Error in FUN(X[[i]], ...) : 'list' object cannot be coerced to type 'double'

I traced through the code and the culprit is function as.party() (line 121 in as.party.C5.0.R)

My data looks like this:

summary(mydata) A B C RES_A RES_B Min. : 1.010 Min. : 1.430 Min. : 1.010 D:23640 D:31626
1st Qu.: 1.710 1st Qu.: 3.000 1st Qu.: 2.250 E:18443 E:23251
Median : 2.150 Median : 3.250 Median : 2.950 F:32177 F:19383
Mean : 2.533 Mean : 3.452 Mean : 3.491
3rd Qu.: 2.800 3rd Qu.: 3.600 3rd Qu.: 4.050
Max. :45.000 Max. :35.000 Max. :55.000

First four vars are the predictors and RES_B is the response.

When line 121 is called it calls lapply passing X and FUN with X = [1:13] and FUN = function (i) { valpred <- integer(0) vec <- strsplit(out[i], ":")[[1]] vec <- vec[vec != ""] varp <- as.vector(sapply(adj.pred, function(j) { ind <- grep(paste0(j, " "), vec) if (length(ind) == 0) return(-1) return(ind) })) if (!any(varp > 0)) { stop("Variable match was not found.") } valpred <- as.vector(which(varp > 0)) valpred <- valpred[which.max(nchar(adj.pred[valpred]))] a1 <- gsub(obj$pred[valpred], "", out[i]) if (n.cat[valpred]) { if (length(grep(" in \{", a1)) > 0) { vec <- a1 while (length(grep("^in", vec)) == 0) { vec <- sub("^.", "", vec) } a2 <- sub("in \{", "", vec) if (length(grep(":", a2)) > 0) { a2 <- strsplit(a2, "\}:") if (length(a2) > 2) { stop("The code currently does not work with factor levels or responses that have the symbol '}:' in them.") } } else { a2 <- sub("\}$", "", a2) } a2 <- a2[[1]][1] a1 <- sub(a2, "X", vec) a2 <- paste0("{", a2, "}", collapse = "") } else { vec <- a1 while (length(grep("^=", vec)) == 0) { vec <- sub("^.", "", vec) } a2 <- sub("^= ", "", vec) a2 <- strsplit(a2, ":") if (length(a2) > 2) { stop("The code currently does not work with factor levels or responses that have the symbol ':' in them.") } a2 <- a2[[1]][1] a1 <- sub(a2, "X", vec) } } a1 <- strsplit(a1, " ")[[1]] a1 <- gsub(":", "", a1) a1 <- gsub("\.\.\.", "", a1) a1 <- a1[a1 != ""] if (n.cat[valpred]) { a1[2] <- a2 } as.vector(c(adj.pred[valpred], a1)) }

This is as much digging I am willing to perform at this point. Unfortunately I have been unable to find a parallel to this problem before. I hope you can reproduce the issue.

Thank you in advance for your efforts!

topepo commented 3 years ago

Was mydata a tibble?

KyrMitsos commented 3 years ago

Hi Max,

No it is a data.frame

topepo commented 3 years ago

Ok. I'm not sure what I can do without a reproducible example. Can you provide one (hopefully via reprex())?

KyrMitsos commented 3 years ago

OK. Here you are:

library(C50)
library(partykit)
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm
# BASIC DATA
#prep data
mydata <- df.kosher[,c(indices.outcome, 39, 40)]
#> Error in eval(expr, envir, enclos): object 'df.kosher' not found
sets <- getTrainAndTestSamples(mydata)
#> Error in getTrainAndTestSamples(mydata): could not find function "getTrainAndTestSamples"
trainset <- sets$train
#> Error in eval(expr, envir, enclos): object 'sets' not found
testset <- sets$test
#> Error in eval(expr, envir, enclos): object 'sets' not found
mydata$RES_HALF <- as.factor(mydata$RES_HALF)
#> Error in is.factor(x): object 'mydata' not found
mydata$RES_FINAL <- as.factor(mydata$RES_FINAL)
#> Error in is.factor(x): object 'mydata' not found
# fit model
# fit <- C5.0(RES_FINAL~., data=mydata, trials=1)
fit <- C5.0(y=mydata$RES_FINAL, x=mydata[,-5], trials=1)
#> Error in C5.0(y = mydata$RES_FINAL, x = mydata[, -5], trials = 1): object 'mydata' not found
# summarize the fit
print(fit)
#> Error in print(fit): object 'fit' not found
# make predictions
predictions <- predict(fit, mydata[,-5])
#> Error in predict(fit, mydata[, -5]): object 'fit' not found
# summarize accuracy
confusionMatrix(as.factor(predictions), as.factor(mydata$RES_FINAL))
#> Error in confusionMatrix(as.factor(predictions), as.factor(mydata$RES_FINAL)): could not find function "confusionMatrix"
plot(fit)
#> Error in plot(fit): object 'fit' not found

Created on 2021-05-08 by the reprex package (v2.0.0)

KyrMitsos commented 3 years ago

As for 'mydata'

'data.frame': 86935 obs. of 5 variables: $ A : num 1.5 2.4 3 3.45 1.57 5.5 2.05 2.05 2 3.65 ... $ B : num 3.75 3.3 3 3.45 3.65 4.35 3.35 3.4 3.5 3.55 ... $ C : num 4.55 2.3 2.05 1.75 4.25 1.35 2.75 2.75 2.75 1.7 ... $ RES_HALF : Factor w/ 3 levels "A","B","C": 2 2 2 3 1 3 3 1 2 3 ... $ RES_FINAL: Factor w/ 3 levels "A","B","C": 3 2 3 3 1 2 3 1 1 2 ...

make sure you reproduce this kind of data.frame and use that. 5 variables three are nums and 2 are factors of 3 levels each. Make it random, it doesn't need to have any more structure or meaning than this. Thank you for attempting to fix this.

topepo commented 3 years ago

That's not really reproducible. I don't have the data associated with your issue.

KyrMitsos commented 3 years ago

Can you not simulate the same dataframe? Just create a random one. You need 3 numerics and 2 Factors of 3 (same levels) each.

I don't think the particular dataframe is at fault. I don't think it has some special properties. Otherwise, tell me and I can send you an .rd file with the workpace variables on my R Studio. Thanks.

topepo commented 3 years ago

I can't hunt for data sets that I know will create the same error that you encountered.

KyrMitsos commented 3 years ago

I personally believe this is not down to a specific dataset. It can happen with any but that have I guess the same characteristics.

If what you say is true, then aren't you motivated to get to the bottom of this bug? Isn't that your goal? To provide a bug free library to the people?

What can I do to help you resolve this bug?

skanskan commented 3 years ago

I have the same problem. The model is fitted properly but I'm not able to plot it. If I try to plot it with the option rules="T" it produces this error:

  Error: tree models only

And if I use rules="F" it says:

  Error in FUN(X[[i]], ...) : 
     'list' object cannot be coerced to type 'double'

I'm using C5.0 v0.13.1 in R v4.02 on Windows 10.