Open larskotthoff opened 7 years ago
For some functions - yes - see e.g. https://github.com/mi2-warsaw/FSelectorRcpp/blob/master/tests/testthat/test-information_gain.R
This is part of the code:
test_that("Comparsion with FSelector", {
expect_equal(information.gain(Species ~ ., data = iris)$attr_importance,
information_gain(formula = Species ~ ., data = iris)$importance)
expect_equal(gain.ratio(Species ~ ., data = iris)$attr_importance,
information_gain(formula = Species ~ ., data = iris,
type = "gainratio")$importance)
expect_equal(symmetrical.uncertainty(Species ~ .,
data = iris)$attr_importance,
information_gain(formula = Species ~ ., data = iris,
type = "symuncert")$importance)
})
For other functions please send us a list of functionalities which must be checked against FSelector, and then we will prepare required tests to convince you that everything is fine:)
I'd love to see tests for all of the functions that users can call, ideally on a range of different inputs. Maybe using quickcheck (https://github.com/RevolutionAnalytics/quickcheck)?
Oh and once I'm convinced I'm willing to officially deprecate FSelector in favour of FSelectorRcpp.
Ok. We will work on this.
Thanks!
@zzawadz another amazing challenge for FSelectorRcpp : )
Maybe it'll be the easiest way to include FSelectorRcpp in the FSelector
@MarcinKosinski Good idea. We can replace functionalities (inner implementation) in FSelector step by step to reach the convergence. @larskotthoff What do you think?
Sounds good. Pull requests welcome!
So this can be closed - https://github.com/mi2-warsaw/FSelectorRcpp/issues/27 : ) @larskotthoff is aware of that we will suggest inner implementation
Getting back to this thread. FSelectorRcpp will be available on CRAN again soon (removed because lack of informtion of C++ dependency) https://github.com/mi2-warsaw/FSelectorRcpp/issues/69
To enable FSelectorRcpp
be a part of FSelector
engine I think we could try
substituting
FSelector:::information.gain.body()
function with the FSelectorRcpp::information_gain()
. We need to polish FSelectorRcpp
edition to produce the same results as FSelector
and also enable some another approaches to dealing with NA
s and discretization of dependent variable.
2 tasks should be finished then
Enable dependent variable discretization the same as FSelector:::equal.frequency.binning.discretization
- FSelectorRcpp
does not provide discretization for the dependent variable. To make it suitable with FSelector
we will enable extra option to discretize the dependent variable (FSelector:::information.gain.body <- function(params, equal = TRUE) {
FSelectorRcpp::information_gain(params, equal = equal)
}
Enable FSelectoRcpp dealing with NAs in explanatory variables as in the RWeka::Discretize
we slightly need to reorganize the code, so that we only remove rows that have NA
s in the dependent variable (and not in any variable considered to be discretized as it was done before) and that we can provide the exact same explanatory variable discretization as in RWeka::Discretize
Hi, I am struggling to get the same results from FSelectorRcpp and FSelector - posted under this issue: https://github.com/mlr-org/mlr/issues/1677#issuecomment-431234791. The results I get are actually very different, and the impact on an end model is large. Would appreciate your help if I am doing anything wrong. Thanks!
@RandomGuessR
FSelectorRcpp treats integer columns like factors, not numeric, and because of that, it does not discretize them before calculating the information gain. You need to cast the integers columns into numerics to get the same result:
See the code below:
library(FSelectorRcpp)
library(FSelector)
dt <- read.csv("~/Downloads/all/train.csv")
dt2 <- data.frame(
yy = dt$target,
X0deb4b6a8 = dt$X0deb4b6a8,
X0deb4b6a8Numeric = as.numeric(dt$X0deb4b6a8)
)
information_gain(yy ~ ., dt2, equal = TRUE)
# attributes importance
# 1 X0deb4b6a8 0.001443917
# 2 X0deb4b6a8Numeric 0.000000000
information.gain(yy ~ ., dt2)
# attr_importance
# X0deb4b6a8 0
# X0deb4b6a8Numeric 0
Thanks for helping with this so quickly! Might be good to document this difference somewhere in the package(s)
Kudos for Zzawadz
pt., 19 paź 2018, 10:34 użytkownik RandomGuessR notifications@github.com napisał:
Thanks for helping with this so quickly!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mi2-warsaw/FSelectorRcpp/issues/51#issuecomment-431288122, or mute the thread https://github.com/notifications/unsubscribe-auth/AGdazkRXSO22dOfoPnz52ZiKuTYusGvSks5umY6kgaJpZM4MdpFY .
@RandomGuessR @MarcinKosinski
I found an inconsistent behavior in FSelectorRcpp:( The information_gain
does not discretize integers, but discretize
do this:( I consider this as a bug, and I'll fix this.
Thanks @zzawadz.
After changing the data from integer to numeric, FSelectorRcpp works like a treat; really happy with the performance. The RWeka-based implementation was too slow for most real-world practical purposes.
FSelectorRcpp
will try to mimic the behavior of FSelector
so that since 0.3.0 integers will be treated as numerics by default, not factors.
Do you guys have any tests to check this? We're thinking of replacing FSelector with FSelectorRcpp in mlr, but we'd like to be sure that we remain reproducible.
@berndbischl