riken-aip / pyHSICLasso

Versatile Nonlinear Feature Selection Algorithm for High-dimensional Data
MIT License
175 stars 42 forks source link

A bug occurs when there are few explanatory variables #31

Open mamitsu2 opened 5 years ago

mamitsu2 commented 5 years ago

There are few explanatory variables, so bug occurs. Please fix it!

myamada0321 commented 5 years ago

Thanks for the comments.

Could you share the dataset you used and your setup? Then, we can reproduce the bug.

Thanks!

mamitsu2 commented 5 years ago

Sorry to be inadequate. I uses sklearn.datesets.load_boston to test this module. If I do as follows, bug doesn't occur.

dataset = load_boston()
# set dataframe
X1_ = pd.DataFrame(dataset.data, columns=dataset.feature_names)
y1_ = pd.DataFrame(dataset.target, columns=['y'])
X1_ = X1_.iloc[:,:]
X1 = np.array(X1_)
y1 = np.array(y1_)
X1_col = X1_.columns
hsic_lasso = HSICLasso()
hsic_lasso.input(X1,y1.flatten(),featname=list(X1_col))
hsic_lasso.regression(num_feat=X1.shape[1], discrete_x=False, n_jobs=2)
hsic_lasso.dump()
hsic_lasso.get_index_score()

but, I do as follows, reduce explanatory variables,

X1_ = X1_.iloc[:,:5]

ValueError: attempt to get argmax of an empty sequence is occured.

A bug occurs here.

~/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/pyHSICLasso/nlars.py in nlars(X, X_ty, num_feat, max_neighbors)
     97         XtXbeta = np.dot(X.transpose(), np.dot(X, beta))
     98         c = X_ty - XtXbeta
---> 99         j = np.argmax(c[I])
    100         C = max(c[I])
    101 
myamada0321 commented 5 years ago

Thanks for the detailed information. We will investigate this case.

muoten commented 4 years ago

I've got the same problem with another dataset with few explanatory variables.

Then I've replicated the error with sklearn.datasets.load_boston and 5 features. It seems that I array gets empty ([]) when lasso_cond=0. And this exception is not controlled on the while loop or compensated anyway.

Any hint to fix this issue? I think the library is very interesting, and HSIC-based optimization may be useful too for datasets with few columns.

Thank you!

hclimente commented 4 years ago

Thanks for your input. We have been looking at alternative Lasso solvers. Unfortunately, we haven't found one that checks all of our boxes... We'll be on the lookout for a new solvers that would address this issue.

renero commented 3 years ago

Thanks for reporting the bug. Has anyone solved it?