vonjd / OneR

This R package implements the One Rule (OneR) Machine Learning classification algorithm with enhancements for sophisticated handling of numeric data and missing values together with extensive diagnostic functions.
Other
40 stars 3 forks source link

multivariate #14

Open ggrothendieck opened 1 year ago

ggrothendieck commented 1 year ago

1) Suggest extending OneR to the multivariate situation where the dependent variable is a vector for each case. For example, using the built-in anscombe data frame and manova in base R we can find the variables out of x1, x2, x3 and x4 which best predict all the y1, y2, y3, y4 target variables (as opposed to performing 4 different runs and possibly getting different best variable for each). In the example below we find that x1 is the best variable to use if we can only use one variable for predicting all 4 y variables. It might have been that if we ran 4 different lm's that different variables would be best for different target variables but using manova we discover which are the overall best.

fo <- cbind(y1, y2, y3, y4) ~ x1 + x2 + x3 + x4
summary(manova(fo, anscombe))
##           Df  Pillai approx F num Df den Df   Pr(>F)   
## x1         1 0.93473  17.9026      4      5 0.003631 **
## x4         1 0.76783   4.1341      4      5 0.075826 . 
## Residuals  8                                           
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

If OneR supported this one could run where mOneR is multivariate OneR .

mOneR(fo, anscombe)

2) Also perhaps one could specify if only one variable could result for all target variables as above or if it would be run separately for each target variable. The latter case would correspond to using lm instead of manova

summary(lm(fo, anscombe))

This is the same as running 4 separate lm instances but can be expressed more compactly in one line.

In this case one would run

OneR(fo, anscombe)

and it would just be a more compact way of running against each target variable separately:

f <- function(y) OneR(reformulate(c("x1", "x2", "x3", "x4"), y), anscombe)
Map(f, c("y1", "y2", "y3", "y4"))
vonjd commented 1 year ago

Thank you for your suggestions, I will have a look into it!