yngvem / group-lasso

Group Lasso implementation following the scikit-learn API
MIT License
105 stars 32 forks source link

LogisticGroupLasso.chosen_groups_ raises a TypeError #13

Closed normanius closed 3 years ago

normanius commented 4 years ago

First, a sign of gratitude: I finally had a look at the LogisticGroupLasso() estimator. Thanks so much for extending your library by LR classifier! So far, it works very well, and blends in perfectly with the sklearn infrastructure. Great job! (For my data, I currently get some convergence warnings from FIESTA, but I guess I have to play around with the parameters first, and that's not the issue here).


Problem: Calling LogisticGroupLasso.chosen_groups_ raises the following exception:

Traceback (most recent call last):
  File "example_logistic_group_lasso.py", line 90, in <module>
    print(gl.chosen_groups_)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/group_lasso/_group_lasso.py", line 442, in chosen_groups_
    return set(np.unique(self.groups_[self.sparsity_mask_]))
TypeError: only integer scalar arrays can be converted to a scalar index

One can reproduce the problem in example_group_lasso.py.

Is it correct that chosen_groups is supposed to return the group ids which have been selected by group lasso. Isn't this information reflected also by the gl.group_reg_vector_?

yngvem commented 4 years ago

Oh, I see what the error is here. Thanks for finding it!

For now, you can use the code snippet set(np.unique(gl.groups[gl.sparsity_mask_])) (notice, no underscore after groups).

I will patch it and write a regression test as soon as I have the time.

yngvem commented 4 years ago

As for the convergence warnings with FISTA. That is not a surprise for logistic regression, especially if you subsample. From personal experience, the chosen groups will not change much if you iterate until convergence, but their weight may. It is therefore sufficient to run the group lasso estimator it in a pipeline like described in this example.

In fact, if you use the SAGA solver for L1 regularised logistic regression in scikit-learn, then it may also not converge for a fair amount of datasets (a colleague of mine tested this, and got different chosen covariate with different random inits).

yngvem commented 4 years ago

I just uploaded a patch to PyPI to fix this, you can upgrade pip install --upgrade group-lasso. I'm glad you're finding it useful!

normanius commented 4 years ago

Sorry to bug you further on this, but the current fix seems not to work for the example example_logistic_group_lasso.

The fix probably solves the problem for group-lasso regression, but not for group-lasso logistic regression.

Traceback (most recent call last):
  File "example_logistic_group_lasso.py", line 91, in <module>
    print(gl.chosen_groups_)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/group_lasso/_group_lasso.py", line 459, in chosen_groups_
    return set(np.unique(self.groups.ravel()[sparsity_mask.ravel()]))
IndexError: boolean index did not match indexed array along dimension 0; dimension is 720 but corresponding boolean dimension is 1440

Your initial suggestion to use set(np.unique(gl.groups[gl.sparsity_mask_])) was working. The problem is connected to the fact that the following is not equivalent.

    # In property sparsity_mask_:
    sparsity_mask = self._get_chosen_coef_mask(self.coef_.mean(1))
    # In property chosen_groups_:
    sparsity_mask = self._get_chosen_coef_mask(self.coef_)

From what I understand, _group_lasso:L458 should be:

    # Current:
    # return set(np.unique(self.groups.ravel()[sparsity_mask.ravel()]))
    # Fix?
    return set(np.unique(self.groups.ravel()[self.sparsity_mask_.ravel()]))

(Though I'm not 100% sure if ravel() is required at all.)

normanius commented 4 years ago

Another suggestion:

BaseGroupLasso implicitly expects the constructor argument groups to be an ndarray. If groups is a list, there will be another error:

Traceback (most recent call last):
  File "/Users/norman/workspace/education/phd/projects/geomtk/python/utilities/executor.py", line 198, in _worker
    result = func(*args)
  File "/Users/norman/workspace/education/phd/projects/geomtk/studies/morphology/scripts/trainExplore.py", line 73, in runTraining
    returnStd=False)
  File "/Users/norman/workspace/education/phd/projects/geomtk/python/utilities/ml.py", line 503, in testBinaryClassifiers
    clfLabel=name)
  File "/Users/norman/workspace/education/phd/projects/geomtk/python/utilities/ml.py", line 197, in trainCrossValidated
    groups = estimator.chosen_groups_
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/group_lasso/_group_lasso.py", line 459, in chosen_groups_
    return set(np.unique(self.groups[self.sparsity_mask_]))
TypeError: only integer scalar arrays can be converted to a scalar index

I therefore suggest to somewhere ensure self.groups to be an ndarray: self.groups = np.asarray(self.groups)

Maybe in BaseGroupLasso.__init__()?

yngvem commented 4 years ago

No need to apologise, I appreciate that you find my code useful and help me discover bugs! I got a lot on my plate with teaching duties right now, but I'll see if I can give this some time and look at it more carefully before next week.

Thanks a lot for your feedback!

SolidOrange commented 4 years ago

I'm getting an attribute error when trying to view LogisticGroupLasso.chosengroups.

AttributeError: 'list' object has no attribute 'ravel'

yngvem commented 3 years ago

Thank you! I have fixed this bug and it will be pushed shortly (hopefully, but I need to test it first). In the meantime, you can use an array of group ids instead of a list to avoid that bug.

yngvem commented 3 years ago

This should be fixed now