wiskott-lab / sklearn-sfa

This project provides Slow Feature Analysis as a scikit-learn-style package.
BSD 3-Clause "New" or "Revised" License
33 stars 6 forks source link

weights or coefficients same as delta values? #1

Closed sbhakat closed 3 years ago

sbhakat commented 3 years ago

Hi,

Just a query:

delta_values_ : array, shape (n_components,)
    The estimated delta values (mean squared time-difference) of
    the different components.

Does the delta values corresponds to weights (or coefficients) of the features associated with each slow components?

Stewori commented 3 years ago

I wouldn't call them "weights" because I think that's misleading; it would more apply to the extraction matrix. Same for "coefficients".

The delta values are more like a quality measure of the result (where "quality" means slowness). They tell you how slow the extracted features actually are (on the training data). The closer the delta values are to zero, the better did the extraction work (in terms of slowness). More precisely, the delta value of each output component is its squared derivative, averaged over the training phase.

At the same time, the delta values are the eigenvalues of the covariance matrix of the signal's component-wise time derivative (after the signal was sphered/whitened). The mathematical link between these views is integration by parts.

sbhakat commented 3 years ago

This might be naive question as I am still getting used to the SFA architecture. image So what is the physical interpretation of 'delta' values in the above plot when comparing with the features!!

'At the same time, the delta values are the eigenvalues of the covariance matrix of the signal's component-wise time derivative' not sure about this statement. For example we can do a PCA on a high dimensional dataset and extract coefficients/weights to PC1 and PC2. So for SFA how can one get weights correspond to SF1 and SF2 for each feature!!

Sorry for the phrasing if it is difficult to comprehend!!

Stewori commented 3 years ago

physical interpretation

I think that depends on the data or the observed system. There might be no "physical" interpretation at all if the system is something abstract. The best physical match to SFA delta values I can think of is kinetic energy (from quantum mechanics; actually, for SFA kinetic energy is at the same time the total energy). Maybe I misinterpret what you mean by "physical". There are some generic interpretations possible, usually requiring additional assumptions. E.g. if one assumes that the input is composed from statistically independent sources, it is proven that certain SFA components correspond to these sources up to some strictly monotic transformation. There is also a close connection to Laplacian eigenmaps. Another thing that can be said is that slowness is usually a good indicator for invariances. So, the slowest feature is in a way the most invariant aspect of the observation. However, it is impossible to give a reasonable overview of SFA interpretation here; actually to a significant extend it is still subject of ongoing research. If you give some hints on what you are interested in I can maybe suggest some references for you.

not sure about this statement...

Also in PCA, each PC corresponds to an eigenvalue of a covariance matrix. The difference to SFA is that in PCA the covariance matrix of the signal itself is taken, while in SFA that of the signal's derivative. Also, SFA selects the smallest eigenvalues while in PCA the largest are taken. Except from this difference in ordering, SFA is much like PCA on the derivative.

So for SFA how can one get weights correspond to SF1 and SF2 for each feature!!

I am not sure what you mean by weights/coefficients. Probably the extraction matrix is what you mean. This would correspond to PCA as the coefficients that compose each output component linearly from each input (for non-linearity one usually expands the signal, e.g. by monomials of certain degree). This matrix is currently not stored by this SFA implementation; it is computed on the fly from some previously computed values. See https://github.com/wiskott-lab/sklearn-sfa/blob/master/sksfa/_sfa.py#L417

In the MDP-implementation of SFA the extraction matrix is stored in the field sf: https://github.com/mdp-toolkit/mdp-toolkit/blob/master/mdp/nodes/sfa_nodes.py#L339 and the delta values are stored in d.

sbhakat commented 3 years ago

I must give a brief on the project. We want to implement a version of SFA in the MSMBuilder tICA format http://msmbuilder.org/3.8.0/_decomposition/msmbuilder.decomposition.tICA.html#msmbuilder.decomposition.tICA (TICA is second order ICA but with a lag time). If you visit the webpage you will see that the TICA object has attributes like components, eigenvalues_ etc.

So we can actually plot the components for the TICA object

image

This was defined like the

   @property
    def components_(self):
        return self.eigenvectors_[:, 0:self.n_components].T

see: https://github.com/msmbuilder/msmbuilder/blob/master/msmbuilder/decomposition/tica.py

I understood the principle of SFA but I think now it is now a bit clear what we are looking for.

I think the following line of sksfa is does has the components (similar to TICA) for each SFA vectors.

self.components_ = np.dot(np.dot(W_diff, np.diag(1/np.sqrt(self.pca_whiten_.explained_variance_))), W_whiten)[-self.n_components_:][::-1]

Let us try!!

mrschue commented 3 years ago

Hi @sbhakat,

@Stewori is right, the extraction matrix is currently not meant to be individually stored/re-used (I assume that you want the parameters of the linear/affine function that maps the SFA input to the actual slow features.). However, the line of code that you found is an early implementation that. It does not yet support anything but the simplest case. In particular, the matrix you found will be not accurate if you are using the fill-mode feature.

If you still want to use it, you should also make sure that your input data is mean-free. In .transform, this is done by the inner PCA transformers, but if you are using the self.components_ directly, you have to do this yourself. You can possibly just take the self.pca_whiten_.mean_ for that.

I will keep this issue as a To-Do item to finish this part.

sbhakat commented 3 years ago

@mrschue Thanks for the comment and suggestions. Yes the input is already mean free. MSMBuilder does produce a mean free scaled output

from msmbuilder.preprocessing import RobustScaler
scaler = RobustScaler()
scaled_diheds = diheds.fit_transform_with(scaler, 'scaled_diheds/', fmt='dir-npy')

print(diheds[0].shape)
print(scaled_diheds[0].shape)

Just to cross validate we are also putting mean free values (output of MSMBuilder printed in a file) something like:

#! time meanfree_sin_chi1_75 meanfree_sin_chi1_76 meanfree_sin_chi1_77 meanfree_sin_chi1_78 meanfree_sin_chi1_79 meanfree_sin_chi1_80 meanfree_sin_chi1_82 meanfree_sin_chi1_83 meanfree_sin_chi1_84 meanfree_cos_chi1_75 meanfree_cos_chi1_76 meanfree_cos_chi1_77 meanfree_cos_chi1_78 meanfree_cos_chi1_79 meanfree_cos_chi1_80 meanfree_cos_chi1_82 meanfree_cos_chi1_83 meanfree_cos_chi1_84 meanfree_sin_chi2_75 meanfree_sin_chi2_76 meanfree_sin_chi2_77 meanfree_sin_chi2_78 meanfree_cos_chi2_75 meanfree_cos_chi2_76 meanfree_cos_chi2_77 meanfree_cos_chi2_78 0.000000 0.569694 -0.563760 -0.293910 -0.085431 -0.851665 -0.932727 -0.414956 0.906188 -0.022867 0.485136 0.332040 1.086123 0.485252 0.581138 0.141012 0.432328 1.073254 -0.354269 -0.209285 0.913038 1.087512 -0.746633 -0.496575 0.551505 -0.306913 0.522497

So I think one can use self.components_ directly. But we are still struggling with getting the attributes similar to TICA http://msmbuilder.org/3.8.0/_decomposition/msmbuilder.decomposition.tICA.html#msmbuilder.decomposition.tICA (see the attributes).

mrschue commented 3 years ago

@sbhakat the current testing branch should contain the functionality that I think you are looking for. After training, you can call the function affine_parameters of the SFA transformer object like this: W, b = sfa.affine_parameters()

The parameters are extracted such that way features = sfa.transform(data) and features = np.dot(data, W.T) + b should be equivalent.

Note that this will currently only work for the standard fit method, not for the partial used by the HSFA implementation. Also, make sure that fill_mode in (None, "zero").

sbhakat commented 3 years ago

@mrschue Thanks a lot for the implementation. I am testing it now. I have checked the code and it is exactly what I was looking for. Thanks again