topepo / C5.0

An R package for fitting Quinlan's C5.0 classification model
https://topepo.github.io/C5.0/
50 stars 20 forks source link

as.party.C50 gives strange result with C5 with weights #7

Closed kohleth closed 6 years ago

kohleth commented 7 years ago

When weights are used in the C5 model, and then the model is converted to a party object, the conversion does not seem to work.

library(C50)
library(partykit)

m1=C5.0(Species~.,data=iris,weights=1:nrow(iris))
m1p=as.party(m1)
m1p
predict(m1p,iris[1,])

You can see that in this case, the party object is predicting some numerical value instead of class.

mvculp commented 7 years ago

I ran the above code and I do observe that there is an issue. I think the problem is with the way C5.0 is using the weights not with the party conversion. For example, take the previous code:

> library(C50)
> library(partykit)
> m1=C5.0(Species~.,data=iris,weights=1:nrow(iris))
> summary(m1)
...
(weights) > 100: virginica (83.1)
(weights) <= 100:
:...Petal.Length <= 1.9: setosa (16.9)
    Petal.Length > 1.9: versicolor (50)
...

Notice that the weights are being used as a predictor. In-fact, C5.0 gives a factor(0) prediction.

> predict(m1,iris[1,])
factor(0)
Levels: setosa versicolor virginica

Now, to observe that the conversion appears to work, choose the weights in such a way so that they will not be used as a predictor.

> library(C50)
> library(partykit)
> 
> m2=C5.0(Species~.,data=iris,weights=c(2,rep(1,(nrow(iris)-1))))
> summary(m2)
...
Petal.Length <= 1.9: setosa (50.7)
Petal.Length > 1.9:
:...Petal.Width > 1.7: virginica (45.7/1)
    Petal.Width <= 1.7:
    :...Petal.Length <= 4.9: versicolor (47.7/1)
        Petal.Length > 4.9: virginica (6/2)
...
> m2p=as.party(m2)
> predict(m2p,iris[1,])
     1 
setosa 
Levels: setosa versicolor virginica
kohleth commented 7 years ago

Hi,

I don't know if this has to do with the version of C50, but I am using the version from github (0.1.0-25) and my fitted model does not use weights (see fix on issue #6) :

> m1=C5.0(Species~.,data=iris,weights=1:nrow(iris))
> summary(m1)
...
Petal.Length <= 4.7:
:...Petal.Length <= 1.9: setosa (16.9)
:   Petal.Length > 1.9: versicolor (45.6/1.4)
Petal.Length > 4.7:
:...Petal.Width > 1.7: virginica (75.8/0.9)
    Petal.Width <= 1.7:
    :...Petal.Length <= 4.9: versicolor (2.7)
        Petal.Length > 4.9: virginica (9/2.1)

Of course, this doesn't rule out what you are saying -- that it has to do with how C50 handles weight.

topepo commented 7 years ago

Yes, this should be fixed in the github version. I'm having some issues with a CRAN release (arcane C issues) but it should be coming soon.

kohleth commented 7 years ago

yes, the issue reported by mvculp does not show up in the github version, but the initial issue i reported is still there.

mvculp commented 7 years ago

Ok. So, my understanding is that there was an issue with the weights and it has been fixed recently with C5.0, but the fix was not on the latest R (CRAN) site. The recent fix in turn caused a downstream issue with the party conversion.

I have taken the latest version from GitHub and executed it to get the issue reported (specifically the weights become the response in the new version). I updated the as.party.C5.0 to fix what I believe is the issue.

topepo commented 6 years ago

Is this resolved? I just re-ran the current github version but don't see an issue.

mvculp commented 6 years ago

Yes. I think so. I reran the code at the beginning and I get the correct answer.