Closed adam2392 closed 2 years ago
Ridge regression solution with regularsation reg
reads:
coef = linalg.solve(X.T @ X + reg * np.eye(n_features), X.T @ y)
so to solve it you just need X.T @ X
and X. T @ y
X.T @ X is nothing more than
\sum_i X[i, :].T X[i, :] # sum of rank 1 n_features x n_features matrices.
here the sum is over samples which are time points in your case.
Does it help @adam2392 ?
Ah yeah I see what you mean.
So I see that this will help definitely when model order is high. I suspect though if the RAM usage isn't too high, then having X and X.T is not too bad and possibly faster?
What would be a good cutoff in estimated RAM usage to automatically do the for loop over samples to construct the X.T @ X matrix?
Or should we allow users to decide with a kwarg?
I would have only one implementation using the online way. It would be a lot easier to test. And it should be systematically faster.
I'm working on this rn, and I prefer copying the code over from statsmodels, rather then having statsmodels be a dependency just cuz the statsmodels API is more R, then Python. Moreover, the statsmodels VAR code is not very well-documented in their docstrings.
However, I'll probably copy a decent chunk of code if I also allow us to do "order selection" using information criterion like AIC/BIC/etc.
Is this okay, or should I just hack together a wrapper of statsmodels to handle the RAM issue?
@agramfort @larsoner
I would copy the relevant code and avoid a dependency on statsmodels but I would also add a test that we don't deviate from statsmodels in case they fix a bug we detect it.
@agramfort I'm looking at the statsmodels code that I repurposed and I realized they don't actually do the above, where they loop over samples/channels: https://www.statsmodels.org/stable/_modules/statsmodels/tsa/vector_ar/var_model.html#VAR
They actually explicitly form the Z predictor matrix.
Is there an example package you were thinking of that actually does the above but for lags as well? Handling the VAR(1) case and directly forming Z.T @ Z
and Z.T @ Y
is easy enough, but gets a bit more involved w/ lags.
Ridge regression solution with regularsation
reg
reads:
coef = linalg.solve(X.T @ X + reg * np.eye(n_features), X.T @ y)
so to solve it you just need
X.T @ X
andX. T @ y
X.T @ X is nothing more than
\sum_i X[i, :].T X[i, :] # sum of rank 1 n_features x n_features matrices.
here the sum is over samples which are time points in your case.
Does it help @adam2392 ?
@agramfort I'm trying to implement the for loop here you suggested to see if it improves the RAM usage for high n_channels/n_samples.
Do you have an idea/tip on how to do it when lags are present? I got stuck wondering if this is possible with lags.
I suspect that using sklearn here is very memory inefficient. To compute a Ridge you need to assemble X.T @ X and X.T @ y and then solve a linear system. Time series packages assemble X.T @ X and X.T @ y on the fly by looping of the signals time stamps. This avoids a huge matrix allocation for storing X (data with all delays etc.)
_Originally posted by @agramfort in https://github.com/mne-tools/mne-connectivity/pull/23#discussion_r688336154_