topepo / C5.0

An R package for fitting Quinlan's C5.0 classification model
https://topepo.github.io/C5.0/
50 stars 20 forks source link

Converting “C50 models” to “rpart” models #40

Open swaheera opened 3 years ago

swaheera commented 3 years ago

I am trying to see if there is a way to use the "rpart.plot" library to plot objects that do not belong to "rpart" (used for making 'decision trees').

For instance, here is the classic "rpart" and "rpart.plot" library in action:

#load libraries
    library(rpart)
    library(rpart.plot)

#load data
    data(iris)

#fit rpart model (i.e. decision tree)
    r = rpart(Species ~., data=iris)

#plot model
    rpart.plot(r)

https://i.stack.imgur.com/S8Y6J.png

Problem: I am working on a multi-class classification problem (like the example above - I just illustrated this with the famous "iris dataset") where "rpart" is taking too long to run (I waited 10 hours and the "rpart" code still did not run).

However, I found another library in R called "c50", which is able to instantaneously create a similar model:

#load library 
library(C50)

#run same model
tree_mod <- C5.0(x = iris[, -5], y = iris$Species, rules = TRUE)

#view model
summary(tree_mod)

plot(tree_mod)

https://i.stack.imgur.com/WTJx4.png

Question : Is there anyway to use the "rpart.plot" library with objects from the "C50" library?

For example:

#my attempt
rpart.plot(tree_mod)

Error in rpart.plot(tree_mod) : Not an rpart object

My idea: It is possible to extract the distinct rules using the "C50" library :

summary(tree_mod)

Rule 1: (50, lift 2.9)
    Petal.Length <= 1.9
    ->  class setosa  [0.981]

Rule 2: (48/1, lift 2.9)
    Petal.Length > 1.9
    Petal.Length <= 4.9
    Petal.Width <= 1.7
    ->  class versicolor  [0.960]

Rule 3: (46/1, lift 2.9)
    Petal.Width > 1.7
    ->  class virginica  [0.958]

Rule 4: (46/2, lift 2.8)
    Petal.Length > 4.9
    ->  class virginica  [0.938]

Similar rules can also be extracted from the "rpart" library:

 rpart.rules(r)

    Species  seto vers virg                                               
     setosa [1.00  .00  .00] when Petal.Length <  2.5                     
 versicolor [ .00  .91  .09] when Petal.Length >= 2.5 & Petal.Width <  1.8
  virginica [ .00  .02  .98] when Petal.Length >= 2.5 & Petal.Width >= 1.8

It it somehow possible to "reformat the rules" from the "C50" library in such a way that they become compatible with "rpart.plot"?

Thanks