mi2-warsaw / FSelectorRcpp

Rcpp (free of Java/Weka) implementation of FSelector entropy-based feature selection algorithms with a sparse matrix support
http://mi2-warsaw.github.io/FSelectorRcpp/
35 stars 15 forks source link

Rewrite exhaustive.search function #13

Closed DSkrzypiec closed 7 years ago

DSkrzypiec commented 8 years ago

As first subtask, which could be useful in other tasks, I have rewritten function combn() from R. This function returns matrix of all k-subsets from given set. C++ template class Subset() doing exactly the same job as R::combn() and it could be found in inst/include/exhaustive.search/Subset.h.

@MarcinKosinski , @zzawadz

MarcinKosinski commented 8 years ago

@zzawadz you are the main architect here :P

zzawadz commented 8 years ago

Oki.

But rewriting exhaustive.search to c++ might be hard, because we have to evaluate user's function:( r eval.fun(attributes[as.logical(subset)]) - this line is tricky.

Of course it is possible to pass R function into Rcpp, and then evaluate it (http://gallery.rcpp.org/articles/r-function-from-c++/). But with this approach we might have problem with adding parallel backend.

We should discuss this on Saturday;)

MarcinKosinski commented 8 years ago

Saturday or sunday? I thought we planned discusion fir sunday? :p

Marcin Kosinski

Dnia 20.04.2016 o godz. 09:56 Zygmunt Zawadzki notifications@github.com napisał(a):

Oki.

But rewriting exhaustive.search to c++ might be hard, because we have to evaluate user's function:( r eval.fun(attributes[as.logical(subset)]) - this line is tricky.

Of course it is possible to pass R function into Rcpp, and then evaluate it (http://gallery.rcpp.org/articles/r-function-from-c++/). But with this approach we might have problem with adding parallel backend.

We should discuss this on Saturday;)

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

zzawadz commented 8 years ago

I will look at this function, and I'll try to propose nice parallel backend. I don't think that we can have significant boost from rewriting this to C++. But small cluster is always nice thing:D

But I need some time to do experiments and read others code.

MarcinKosinski commented 8 years ago

For parallelism you can check the cv.glmnet/glmnet parallel USER INTERFACE

require(doMC)
registerDoMC(cores=2)

cvfit=cv.glmnet(x, y, family="multinomial", type.multinomial =
"grouped", parallel = TRUE)

https://cran.r-project.org/web/packages/glmnet/vignettes/glmnet_beta.html#log

2016-05-20 9:12 GMT+02:00 Zygmunt Zawadzki notifications@github.com:

I will look at this function, and I'll try to propose nice parallel backend. I don't think that we can have significant boost from rewriting this to C++. But small cluster is always nice thing:D

But I need some time to do experiments and read others code.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/mi2-warsaw/FSelectorRcpp/issues/13#issuecomment-220534046

zzawadz commented 8 years ago

@MarcinKosinski could you provide some real test examples for greedy_search and exhaustive_search?

Example from FSelector works pretty well (on my machine:)) with simple parallel backend, but I might missed something...

MarcinKosinski commented 8 years ago

Already added to my life-backlog!

2016-05-23 23:15 GMT+02:00 Zygmunt Zawadzki notifications@github.com:

@MarcinKosinski https://github.com/MarcinKosinski could you provide some real test examples for greedy_search and exhaustive_search?

Example from FSelector works pretty well (on my machine:)) with simple parallel backend, but I might missed something...

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/mi2-warsaw/FSelectorRcpp/issues/13#issuecomment-221098415

MarcinKosinski commented 8 years ago

@zzawadz what would you say about minimizing the amount of functions if possible like we have done with gathering gaino and symmuncertainty into information_gain?

What would you say about gathering exhaustive_search and greedy_search (and maybe in the future best_first_search) into one function with type argument?

propositions for new name

MarcinKosinski commented 8 years ago

@zzawadz The greedy_search and exhaustive_search could have the same output. Now it is different:

exhaustive

Exhaustive Search Result:

     Sepal.Length + Petal.Width

  Results for other attributes subsamples are avaiable.
  You can extract them with x[["allResult"]]

greedy

> greedy_search(names(iris)[-5], evaluator, iris, allowParallel = FALSE)
$result
[1] 0.9538829

$attrs
[1] 1 0 0 1

Imho the approeach in exhaustive is too fancy and could be more R-ish - I mean the result could be a list, which is more conveninent for future automatization and pipelining?

Result of exhaustive search and greedy search could both be lists with (if possible) the same names

proposition for returned value for those 2 functions - an objects of class feature_search/selection_result with slots

Output will always have the same format and it will be easier to work with that package, to explain this package funcitonality and automatize this function in the pipelines :)

What do you think?

zzawadz commented 8 years ago

I think that I need to think, but probably you're right:)

Creating user interfaces is hard stuff for me;)

MarcinKosinski commented 7 years ago

@krzyslom please unify the output of greedy and exhaustive

MarcinKosinski commented 7 years ago

Thanks @krzyslom for unifying the result of those 2 functions. I have updated and extended the documentation for this function and renamed few parameters and final values' names.

This function is checked and ready to go.