Closed normanius closed 3 years ago
Oh, I see what the error is here. Thanks for finding it!
For now, you can use the code snippet set(np.unique(gl.groups[gl.sparsity_mask_]))
(notice, no underscore after groups).
I will patch it and write a regression test as soon as I have the time.
As for the convergence warnings with FISTA. That is not a surprise for logistic regression, especially if you subsample. From personal experience, the chosen groups will not change much if you iterate until convergence, but their weight may. It is therefore sufficient to run the group lasso estimator it in a pipeline like described in this example.
In fact, if you use the SAGA solver for L1 regularised logistic regression in scikit-learn, then it may also not converge for a fair amount of datasets (a colleague of mine tested this, and got different chosen covariate with different random inits).
I just uploaded a patch to PyPI to fix this, you can upgrade pip install --upgrade group-lasso
. I'm glad you're finding it useful!
Sorry to bug you further on this, but the current fix seems not to work for the example example_logistic_group_lasso
.
The fix probably solves the problem for group-lasso regression, but not for group-lasso logistic regression.
Traceback (most recent call last):
File "example_logistic_group_lasso.py", line 91, in <module>
print(gl.chosen_groups_)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/group_lasso/_group_lasso.py", line 459, in chosen_groups_
return set(np.unique(self.groups.ravel()[sparsity_mask.ravel()]))
IndexError: boolean index did not match indexed array along dimension 0; dimension is 720 but corresponding boolean dimension is 1440
Your initial suggestion to use set(np.unique(gl.groups[gl.sparsity_mask_]))
was working. The problem is connected to the fact that the following is not equivalent.
# In property sparsity_mask_:
sparsity_mask = self._get_chosen_coef_mask(self.coef_.mean(1))
# In property chosen_groups_:
sparsity_mask = self._get_chosen_coef_mask(self.coef_)
From what I understand, _group_lasso:L458 should be:
# Current:
# return set(np.unique(self.groups.ravel()[sparsity_mask.ravel()]))
# Fix?
return set(np.unique(self.groups.ravel()[self.sparsity_mask_.ravel()]))
(Though I'm not 100% sure if ravel() is required at all.)
Another suggestion:
BaseGroupLasso
implicitly expects the constructor argument groups
to be an ndarray. If groups
is a list, there will be another error:
Traceback (most recent call last):
File "/Users/norman/workspace/education/phd/projects/geomtk/python/utilities/executor.py", line 198, in _worker
result = func(*args)
File "/Users/norman/workspace/education/phd/projects/geomtk/studies/morphology/scripts/trainExplore.py", line 73, in runTraining
returnStd=False)
File "/Users/norman/workspace/education/phd/projects/geomtk/python/utilities/ml.py", line 503, in testBinaryClassifiers
clfLabel=name)
File "/Users/norman/workspace/education/phd/projects/geomtk/python/utilities/ml.py", line 197, in trainCrossValidated
groups = estimator.chosen_groups_
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/group_lasso/_group_lasso.py", line 459, in chosen_groups_
return set(np.unique(self.groups[self.sparsity_mask_]))
TypeError: only integer scalar arrays can be converted to a scalar index
I therefore suggest to somewhere ensure self.groups
to be an ndarray: self.groups = np.asarray(self.groups)
Maybe in BaseGroupLasso.__init__()
?
No need to apologise, I appreciate that you find my code useful and help me discover bugs! I got a lot on my plate with teaching duties right now, but I'll see if I can give this some time and look at it more carefully before next week.
Thanks a lot for your feedback!
I'm getting an attribute error when trying to view LogisticGroupLasso.chosengroups.
AttributeError: 'list' object has no attribute 'ravel'
Thank you! I have fixed this bug and it will be pushed shortly (hopefully, but I need to test it first). In the meantime, you can use an array of group ids instead of a list to avoid that bug.
This should be fixed now
First, a sign of gratitude: I finally had a look at the LogisticGroupLasso() estimator. Thanks so much for extending your library by LR classifier! So far, it works very well, and blends in perfectly with the sklearn infrastructure. Great job! (For my data, I currently get some convergence warnings from FIESTA, but I guess I have to play around with the parameters first, and that's not the issue here).
Problem: Calling
LogisticGroupLasso.chosen_groups_
raises the following exception:One can reproduce the problem in
example_group_lasso.py
.Is it correct that
chosen_groups
is supposed to return the group ids which have been selected by group lasso. Isn't this information reflected also by thegl.group_reg_vector_
?