lcmd-epfl / MLKRR

Code for the Metric Learning for Kernel Ridge Regression algorithm
MIT License
8 stars 4 forks source link

ValueError: _lbfgsb._lbfgsb.setulb: failed to create array from the 10th argument `wa` -- 0-th dimension must be fixed to 666123109 but got 4961090405 [bug] #10

Open UnixJunkie opened 2 months ago

UnixJunkie commented 2 months ago

New sigma: 1.0 (took 32.67 s)
Traceback (most recent call last):
  File "/home/fbr/src/fp_bench/./rfr.py", line 273, in <module>
    r2 = mlkrr_train_test_UCAP(train_fn, test_fn)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fbr/src/fp_bench/./rfr.py", line 181, in mlkrr_train_test_UCAP
    model.fit(X_train, y_train)
  File "/home/fbr/src/fp_bench/mlkrr.py", line 213, in fit
    res = minimize(
          ^^^^^^^^^
  File "/usr/lib/python3/dist-packages/scipy/optimize/_minimize.py", line 710, in minimize
    res = _minimize_lbfgsb(fun, x0, args, jac, bounds,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/scipy/optimize/_lbfgsb_py.py", line 356, in _minimize_lbfgsb
    _lbfgsb.setulb(m, x, low_bnd, upper_bnd, nbd, f, g, factr,
ValueError: _lbfgsb._lbfgsb.setulb: failed to create array from the 10th argument `wa` -- 0-th dimension must be fixed to 666123109 but got 4961090405```
UnixJunkie commented 2 months ago

Did you ever saw this while trying to fit a model?

raimon-fa commented 2 months ago

I can't remember finding such an error. What is the shape of data you are using to train? How many data points and how many features?

UnixJunkie commented 2 months ago

160 datapoints in the training set. 4428 features each.

On Thu, Jun 6, 2024 at 6:45 PM raimon-fa @.***> wrote:

I can't remember finding such an error. What is the shape of data you are using to train? How many data points and how many features?

— Reply to this email directly, view it on GitHub https://github.com/lcmd-epfl/MLKRR/issues/10#issuecomment-2151860634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUFADJS34XYPZVA3LSYR3ZGAVUBAVCNFSM6AAAAABI4ITMMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJRHA3DANRTGQ . You are receiving this because you authored the thread.Message ID: @.***>

raimon-fa commented 2 months ago

I don't think this is the cause of you error. Nevertheless, I think you will severely overfit with such a big feature space and few samples. Maybe you can prefilter the feature space.

UnixJunkie commented 2 months ago

You have a regularization parameter, can't I control it to prevent overfitting? i.e. use stronger regularization

On Thu, Jun 6, 2024 at 7:15 PM raimon-fa @.***> wrote:

I don't think this is the cause of you error. Nevertheless, I think you will severely overfit with such a big feature space and few samples. Maybe you can prefilter the feature space.

— Reply to this email directly, view it on GitHub https://github.com/lcmd-epfl/MLKRR/issues/10#issuecomment-2151916116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUFAD7LRQNH75O22BRQXDZGAZDTAVCNFSM6AAAAABI4ITMMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJRHEYTMMJRGY . You are receiving this because you authored the thread.Message ID: @.***>

UnixJunkie commented 2 months ago

This error might have been caused by lack of RAM on the computer where I was trying. I might confirm this soon.

UnixJunkie commented 2 months ago

This also happen in a very high memory compute cluster node; so available memory is not the issue here.

UnixJunkie commented 2 months ago

Are there some limitations I don't know even after having read the paper, like a matrix w/ only positive integer values might be problematic?

UnixJunkie commented 2 months ago

Are the features supposed to be 0-centered and have unit variance?

UnixJunkie commented 2 months ago

Cc. @puckvg

raimon-fa commented 2 months ago

No issues with only positive integer values, and there are no algorithmic restrictions related to the variance, although it it probably a good idea to normalize your features before.

raimon-fa commented 2 months ago

The regularization might help with overfitting but we did not test it extensively.

UnixJunkie commented 2 months ago

The Iris dataset that is used as en example does not have zero mean and unit variance, but MLKRR can work on it.

UnixJunkie commented 2 months ago

I will try lowering the dimensionality of my dataset (keeping only the most frequent 720 features).

UnixJunkie commented 2 months ago

Current implementation works up to 1500 dimensions. starting from 1550 dimensions, it will crash.