Open UnixJunkie opened 2 months ago
Did you ever saw this while trying to fit a model?
I can't remember finding such an error. What is the shape of data you are using to train? How many data points and how many features?
160 datapoints in the training set. 4428 features each.
On Thu, Jun 6, 2024 at 6:45 PM raimon-fa @.***> wrote:
I can't remember finding such an error. What is the shape of data you are using to train? How many data points and how many features?
— Reply to this email directly, view it on GitHub https://github.com/lcmd-epfl/MLKRR/issues/10#issuecomment-2151860634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUFADJS34XYPZVA3LSYR3ZGAVUBAVCNFSM6AAAAABI4ITMMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJRHA3DANRTGQ . You are receiving this because you authored the thread.Message ID: @.***>
I don't think this is the cause of you error. Nevertheless, I think you will severely overfit with such a big feature space and few samples. Maybe you can prefilter the feature space.
You have a regularization parameter, can't I control it to prevent overfitting? i.e. use stronger regularization
On Thu, Jun 6, 2024 at 7:15 PM raimon-fa @.***> wrote:
I don't think this is the cause of you error. Nevertheless, I think you will severely overfit with such a big feature space and few samples. Maybe you can prefilter the feature space.
— Reply to this email directly, view it on GitHub https://github.com/lcmd-epfl/MLKRR/issues/10#issuecomment-2151916116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUFAD7LRQNH75O22BRQXDZGAZDTAVCNFSM6AAAAABI4ITMMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJRHEYTMMJRGY . You are receiving this because you authored the thread.Message ID: @.***>
This error might have been caused by lack of RAM on the computer where I was trying. I might confirm this soon.
This also happen in a very high memory compute cluster node; so available memory is not the issue here.
Are there some limitations I don't know even after having read the paper, like a matrix w/ only positive integer values might be problematic?
Are the features supposed to be 0-centered and have unit variance?
Cc. @puckvg
No issues with only positive integer values, and there are no algorithmic restrictions related to the variance, although it it probably a good idea to normalize your features before.
The regularization might help with overfitting but we did not test it extensively.
The Iris dataset that is used as en example does not have zero mean and unit variance, but MLKRR can work on it.
I will try lowering the dimensionality of my dataset (keeping only the most frequent 720 features).
Current implementation works up to 1500 dimensions. starting from 1550 dimensions, it will crash.