Open krzyslom opened 7 years ago
@krzyslom I think FSelectorRcpp
completly removes rows with NAs
. Can you provide a summary of behaviour for FSelector
in this case?
From FSelector:::information.gain.body
-> FSelector:::discretize.all
-> FSelector:::supervised.discretization
I see
function (formula, data)
{
data = get.data.frame.from.formula(formula, data)
complete = complete.cases(data[[1]])
all.complete = all(complete)
if (!all.complete) {
new_data = data[complete, , drop = FALSE]
result = Discretize(formula, data = new_data, na.action = na.pass)
return(result)
}
else {
return(Discretize(formula, data = data, na.action = na.pass))
}
}
<environment: namespace:FSelector>
That FSelector
removes only rows where NA
is in the dependent variable.
So the only thing is to check how does FSelector
(by the interface to RWeka::Dicretize` deals with NAs in the explanatory variables
> RWeka::Discretize
An R interface to Weka class 'weka.filters.supervised.attribute.Discretize', which has
information
An instance filter that discretizes a range of numeric attributes in the dataset
into nominal attributes. Discretization is by Fayyad & Irani's MDL method (the
default).
For more information, see:
Usama M. Fayyad, Keki B. Irani: Multi-interval discretization of continuousvalued
attributes for classification learning. In: Thirteenth International Joint
Conference on Articial Intelligence, 1022-1027, 1993.
Igor Kononenko: On Biases in Estimating Multi-Valued Attributes. In: 14th
International Joint Conference on Articial Intelligence, 1034-1040, 1995.
BibTeX:
@INPROCEEDINGS{Fayyad1993,
publisher = {Morgan Kaufmann Publishers},
year = {1993},
pages = {1022-1027},
author = {Usama M. Fayyad and Keki B. Irani},
title = {Multi-interval discretization of continuousvalued attributes for
classification learning},
volume = {2},
booktitle = {Thirteenth International Joint Conference on Articial Intelligence},
}
@INPROCEEDINGS{Kononenko1995,
year = {1995},
pages = {1034-1040},
PS = {http://ai.fri.uni-lj.si/papers/kononenko95-ijcai.ps.gz},
author = {Igor Kononenko},
title = {On Biases in Estimating Multi-Valued Attributes},
booktitle = {14th International Joint Conference on Articial Intelligence},
}
Argument list:
x(formula, data, subset, na.action, control = NULL)
Returns objects inheriting from classes:
Discretize data.frame
This corresponds to #51 issue.