riken-aip / pyHSICLasso

Versatile Nonlinear Feature Selection Algorithm for High-dimensional Data
MIT License
173 stars 42 forks source link

Block Lasso selects less features than vanilla algorithm #35

Closed pechom closed 5 years ago

pechom commented 5 years ago

When I used block Lasso for 77 features treshold (from 770 features) I got only 57 features. Block was divisor of number of data instances. However, when I used block as zero, I got exactly 77 features. Is it normal when block Lasso returns less features? This happened, when I used permutation parameter M with value one.

The other difference is, that when I use vanilla Lasso I get following warning: C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:77: RuntimeWarning: divide by zero encountered in true_divide gamma1 = (C - c[I]) / (XtXw[A[0]] - XtXw[I])

Block lasso had no warnings.

Then I tried block Lasso with M=2. I got 77 features, but also following warnings: C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:77: RuntimeWarning: invalid value encountered in true_divide gamma1 = (C - c[I]) / (XtXw[A[0]] - XtXw[I]) C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:83: RuntimeWarning: invalid value encountered in less_equal gamma[gamma <= 1e-9] = np.inf C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:85: RuntimeWarning: invalid value encountered in less mu = min(gamma) C:\Program Files\Python37\lib\site-packages\pyHSICLasso\nlars.py:77: RuntimeWarning: divide by zero encountered in true_divide gamma1 = (C - c[I]) / (XtXw[A[0]] - XtXw[I])

At last, I tried M=3, also got 77 features and the same warning as with vanilla Lasso.

I have two questions. Should I use M=1 with no warning and less features or M=3 with the same warning as vanilla Lasso had? Are these warnings of some importance, or they are within normal expected behavior?

UPDATE Now I tried to get 9200 features from 92000 with block Lasso with B=19, M=3 but I got even less features than before - only 33. Should I scale M with number of features?

hclimente commented 5 years ago

Hello pechom,

The parameter M refers to the number of permutations to run. The larger it is, the closest block HSIC Lasso will approximate vanilla HSIC Lasso's solution. But also more memory and longer runtimes will be required. In all the cases we examined, M = 3 produced a reasonable approximation, even in settings with over 300k features. However, as a rule of thumb, M (or, conversely, B) should take the highest that your computer can handle (up to the point where you can actually run vanilla HSIC Lasso). So you should definitely increase it if you are able to and the current configuration is not producing a good enough solution.

Regarding the warnings, they are part of the expected behavior, and pyHSICLasso handles them appropriately under the hood.

pechom commented 5 years ago

Thanks for the reply and explanation.

I will try raising values of M for 500k features, hopefuly I will get enough features before i run out of memory.