yngvem / group-lasso

Group Lasso implementation following the scikit-learn API
MIT License
105 stars 32 forks source link

GroupLasso silently assumes that the labels y are stored in a column vector #1

Closed normanius closed 5 years ago

normanius commented 5 years ago

Thanks for your work so far. Had a first look at your group-lasso tool. Looks as if it runs super fast, though I haven't tested the performance yet in detail.

I observed a small problem if the y is stored in a 1d-array:

How to reproduce:

X = ...
y = ...
y = y.flatten()

gl.fit(X,y)

This will yield the following exception.

Traceback (most recent call last):
  File "trainExplore.py", line 73, in <module>
    executor.run()
  File "/Users/norman/workspace/education/phd/projects/geomtk/python/utilities/executor.py", line 811, in run
    args.func(args)
  File "/Users/norman/workspace/education/phd/projects/geomtk/python/utilities/executor.py", line 403, in _runSingle
    self.ret = self._functor(args.file, args.outDir, args, taskInfo)
  File "trainExplore.py", line 48, in runTraining
    returnStd=False)
  File "/Users/norman/workspace/education/phd/projects/geomtk/python/utilities/ml.py", line 338, in testBinaryClassifiers
    clf.fit(XTrain, yTrain)
  File "/Users/norman/workspace/dev/misc/python/group-lasso/group_lasso/_group_lasso.py", line 258, in fit
    self._fista(X, y, lipschitz_coef=lipschitz_coef)
  File "/Users/norman/workspace/dev/misc/python/group-lasso/group_lasso/_group_lasso.py", line 208, in _fista
    prox
  File "/Users/norman/workspace/dev/misc/python/group-lasso/group_lasso/_group_lasso.py", line 155, in _fista_it
    u_ = prox(v - grad(v)/L)
  File "/Users/norman/workspace/dev/misc/python/group-lasso/group_lasso/_group_lasso.py", line 185, in grad
    SSE_grad = _subsampled_l2_grad(X, w, y, self.subsampling_scheme)
  File "/Users/norman/workspace/dev/misc/python/group-lasso/group_lasso/_group_lasso.py", line 32, in _subsampled_l2_grad
    A, b = subsample(subsampling_scheme, A, b)
  File "/Users/norman/workspace/dev/misc/python/group-lasso/group_lasso/_subsampling.py", line 54, in subsample
    return _extract_from_singleton_iterable([X[inds, :] for X in Xs])
  File "/Users/norman/workspace/dev/misc/python/group-lasso/group_lasso/_subsampling.py", line 54, in <listcomp>
    return _extract_from_singleton_iterable([X[inds, :] for X in Xs])
IndexError: too many indices for array

You probably require a check at the beginning of the fit function. Just a detail, just wanted to let you know.

Good courage with the further development

normanius commented 5 years ago

The described problem is covered by the more general issue https://github.com/yngvem/group-lasso/issues/2