rikhuijzer / SIRUS.jl

Interpretable Machine Learning via Rule Extraction
https://sirus.jl.huijzer.xyz/
MIT License
30 stars 2 forks source link

Allow `Int` classes #46

Closed rikhuijzer closed 1 year ago

rikhuijzer commented 1 year ago

Antonello Lobianco noticed that the outcome classes are typically floats (for example, 1.0 and 0.0) whereas integers would be much more suitable. For example, this was the output for the haberman dataset:

StableRules model with 8 rules:
 if X[i, :nodes] < 8.0 then 0.156 else 0.031 +
 if X[i, :nodes] < 14.0 then 0.164 else 0.026 +
 if X[i, :nodes] < 4.0 then 0.128 else 0.037 +
 if X[i, :nodes] ≥ 8.0 & X[i, :age] < 38.0 then 0.0 else 0.008 +
 if X[i, :year] ≥ 1966.0 & X[i, :age] < 42.0 then 0.0 else 0.005 +
 if X[i, :nodes] < 2.0 then 0.107 else 0.034 +
 if X[i, :year] ≥ 1966.0 & X[i, :age] < 38.0 then 0.0 else 0.001 +
 if X[i, :year] < 1959.0 & X[i, :nodes] ≥ 2.0 then 0.0 else 0.003
and 2 classes: [0.0, 1.0].
Note: showing only the probability for class 1.0 since class 0.0 has
      probability 1 - p.

This PR changes that to

StableRules model with 8 rules:
 if X[i, :nodes] < 8.0 then 0.156 else 0.031 +
 if X[i, :nodes] < 14.0 then 0.164 else 0.026 +
 if X[i, :nodes] < 4.0 then 0.128 else 0.037 +
 if X[i, :nodes] ≥ 8.0 & X[i, :age] < 38.0 then 0.0 else 0.008 +
 if X[i, :year] ≥ 1966.0 & X[i, :age] < 42.0 then 0.0 else 0.005 +
 if X[i, :nodes] < 2.0 then 0.107 else 0.034 +
 if X[i, :year] ≥ 1966.0 & X[i, :age] < 38.0 then 0.0 else 0.001 +
 if X[i, :year] < 1959.0 & X[i, :nodes] ≥ 2.0 then 0.0 else 0.003
and 2 classes: [0, 1]. 
Note: showing only the probability for class 1 since class 0 has probability 1 - p.

Much better like this. Thank you, @sylvaticus!