mi2-warsaw / FSelectorRcpp

Rcpp (free of Java/Weka) implementation of FSelector entropy-based feature selection algorithms with a sparse matrix support
http://mi2-warsaw.github.io/FSelectorRcpp/
35 stars 15 forks source link

Add `integer2numeric` to `information_gain` and `discretize` functions. #74

Closed zzawadz closed 6 years ago

zzawadz commented 6 years ago

Integer vectors are quite problematic because sometimes they might be treated as a numeric variable (to they should be discretized before calculating the information gain), however sometimes they might be used to encode a factor variable (e.g., days of a week, gender, and so on), so discretizing them is not a good idea;)

I'll add integer2numeric to control this behavior, and inform the users that it might be a problem because it seems that they're not aware of that.

Also, it makes the package a bit more consistent because right now information_gain leaves integer columns as is, but discretize discretizes them:(

zzawadz commented 6 years ago

Closed with https://github.com/mi2-warsaw/FSelectorRcpp/pull/75