Closed MaxBenChrist closed 7 years ago
To solve this, one needs to implement a .get_params()
method
If you submit a PR for this I'm more than happy to accept it, but I'll be very busy in the coming months unfortunately.. Cheers
okay lets have a deal, I will submit a pr for this and you upload it to pypi? ;)
Alright, deal :) (but only in the 2nd or 3rd week of Jan)
Sorry to interrupt, but this seem like a pretty bad idea. The point of Boruta is that it is an all relevant method, so it should be optimised for robust selection rather than for lowest post-selection error (what GridSearch does). I am afraid that adding such pursuit will simply degenerate the method into an incredibly inefficient random sampler.
Since Miron is the author of the Boruta algorithm, I'll trust him on this one. Unless you can convince him @MaxBenChrist :)
First, merry Christmas to you all! :)
Hi @mbq, first, nice paper + algorithm. Regarding your doubts about using Boruta in a Grid Search:
Lets say we have a simple pipeline, first Boruta and then a classifier C. A Gridsearch over many different folds optimizes the parameters of Boruta for the best classifier performance C. Now you fear that Boruta will become a random sampler. I am not sure why.
How else should one determine the parameters (e.g. max_depth of random forest classifier, or number of estimators) of Boruta if not on the final classifier performance? On real world data sets, we don't know which features are relevant and which not. This is a general problem, if I have pipeline with a feature selection algorithm, I have to optimize all the parameters including those of the feature selection algorithm at the same time. I can't optimize the parameter of the feature selection step alone because there is no score / loss function for it.
Happy holidays!
The problem here is that you assume the classification error is minimal on the relevant set of features, which is false in general (because of redundant features, classifier characteristics, noise, overfitting, etc.). That's why there are two classes of FS methods, "minimal optimal" and "all relevant" (nice paper about this, with Bayesian net definitions).
This is obviously true that optimising an all relevant method is basically next to hopeless. Still, you may aim at robustness (like stability of the selection under perturbations to the input set; though it is only an upper bound, there may be a perfectly stable method which selects junk), use some domain knowledge or get some generally robust methods and hope for the best. The latter is what Boruta does; for RF classification max_depth is infinity by design (sic, otherwise this is just some random CART ensemble), default m is rarely significantly suboptimal, finally RF is expected to converge with the number of trees, so overshooting n only costs time.
@mbq The paper is great. I studied it in detail and learned a lot. Thank you for that!
The problem here is that you assume the classification error is minimal on the relevant set of features, which is false in general (because of redundant features, classifier characteristics, noise, overfitting, etc.).
Actually, I deploy complex machine learning pipelines that contain both "all relevant" and "minimal optimal" or heavy regularized classifiers together. I create a huge amount of features and then use multiple layers of filtering/regularization/feature selection.
When you say that
optimising an all relevant method is basically next to hopeless.
Are you referring to Corollary 14 from the Nilsson paper here? ( The all-relevant problem requires exhaustive subset search. )
Even then, I think your pipeline would benefit more from Boruta with some sane default params as a filter of irrelevant attributes than from Boruta with parameters tuned to yield best accuracy (because I think it would mostly degenerate Boruta into returning few most obvious features or even pure noise); but I may be wrong, with a such meta-meta approach anything is possible.
Are you referring to Corollary 14 from the Nilsson paper here? ( The all-relevant problem requires exhaustive subset search. )
Well, no, rather that all relevant does not optimise error, thus it is hard to assess how all relevant some selection really is. Also Nilsson et al. consider asymptotic perfect case when you have a near perfect conditional probability estimates -- the whole Boruta mess is motivated by the fact this is really hard to achieve in problems that need feature selection.
So what is your final position on the .get_params()
method? :) Will you merge a pr containing such a method (triggering a warning about the all-revelant issues when used).
When using boruta_py in a sklearn gridsearch, the error
object has no attribute 'get_params'
occurs. It would be interesting, if one could also optimize the parameter of the boruta feature selection