mi2-warsaw / FSelectorRcpp

Rcpp (free of Java/Weka) implementation of FSelector entropy-based feature selection algorithms with a sparse matrix support
http://mi2-warsaw.github.io/FSelectorRcpp/
35 stars 15 forks source link

include column with variable name in information_gain() output #49

Closed twolodzko closed 7 years ago

twolodzko commented 7 years ago

information_gain() returns single-column data.frame with importance scores:

infrm_> information_gain(formula = Species ~ ., data = iris, type = "symuncert")
             importance
Sepal.Length  0.4155563
Sepal.Width   0.2452743
Petal.Length  0.8571872
Petal.Width   0.8705214

This output is however not very friendly since names of the variables are provided as rownames. Instead they should rather be provided as additional column with their names. This would make them easier accessible from other functions.

Notice that using rownames to pass on additional information about data is rather discouraged by many authors. Moreover, transforming the information.gain() output to other objects, e.g. dplyr's tibble, could possibly lead to dropping the rownames.

MarcinKosinski commented 7 years ago

The idea Here is to provide as much compatibility with the original FSelector output as possible, since FSelectorRcpp is just rewritten and updated other package, called FSelector.

But addding additional column might not be such a bad idea. @zzawadz what do you think?

zzawadz commented 7 years ago

Notice that using rownames to pass on additional information about data is rather discouraged by many authors. Moreover, transforming the information.gain() output to other objects, e.g. dplyr's tibble, could possibly lead to dropping the rownames.

Good point. I think that we shouldn't sacrifice usability for backward compatibility.

@twolodzko You're just in time.We are planning to submit our package to CRAN tomorrow;) But I think it will contain that feature:)

zzawadz commented 7 years ago

@MarcinKosinski @twolodzko can you play a bit with the new version?

MarcinKosinski commented 7 years ago

I might prepare a new blog post about the release and play with some code. Does cut_attrs() know right now that the information_gain gives different result?

Marcin Kosinski

Dnia 04.03.2017 o godz. 08:50 Zygmunt Zawadzki notifications@github.com napisał(a):

@MarcinKosinski @twolodzko can you play a bit with the new version?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

twolodzko commented 7 years ago

@MarcinKosinski @zzawadz I'll play around and let you know if I have any further comments.