Closed stefdoerr closed 8 years ago
Looks correct. You could simply use numpy.testing.assert_almost_equal on the result to check if you obtain the original data.
Ah yes, if I use all the dimensions. Valid point ;) Thanks.
Hm, I am getting weird behaviour from the TICA class.
I ask for all dimensions TICA(20, 595)
and indeed if I do tica.dimension()
I get 595.
However if I do tica.transform(data)
it returns 528 dimensional data.
What is the reason for this?
Hi Stefan, I think I have an idea, can you please post the original TICA-call?
tica = TICA(20, 595)
proj1 = tica.fit_transform(datalist)
proj2 = tica.transform(otherdataarray)
That's normal and correct. It is necessary to reduce the rank of the matrices to the numerical rank in order to solve the problem. You generally can not expect that the output dimension equals the input dimension (unless your problem is low-dimensional to start with).
Am 16/12/15 um 16:07 schrieb Stefan:
Hm, I am getting weird behaviour from the TICA class. I ask for all dimensions |TICA(20, 595)| and indeed if I do |tica.dimension()| I get 595. However if I do |tica.transform(data)| it returns 528 dimensional data. What is the reason for this?
— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/653#issuecomment-165135512.
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354 Web: research.franknoe.de
So if I am not mistaken, it has to do with the epsilon argument, that is, to ignore TICA-vectors that have eigenvalues smaller than that epsilon.
We seem to have intended to document this at some moment: the docs for transform say "see epsilon", but then nothing is mentioned. I'm going to post the code where epsilon happens, perhaps we should improve the docs
Example:
In [7]: tica = TICA(20, 595)
In [8]: tica.dimension()
Out[8]: 595
In [9]: np.shape(data.dat.tolist())
Out[9]: (2130, 500, 595)
In [10]: dataticalist = tica.fit_transform(data.dat.tolist())
calculate covariances: 100% (2130/2130) [#############################################################################################################################] eta 00:00 |
In [11]: np.shape(dataticalist)
Out[11]: (2130, 500, 528)
wow! 3 guys posting simultaneously
From the generalized eigenvalue problem-solver https://github.com/markovmodel/PyEMMA/blob/21bbe600d044a2e0e59b280dba6e2510ba2a5158/pyemma/util/linalg.py#L117
epsilon : float
eigenvalue norm cutoff. Eigenvalues of C0 with norms <= epsilon will be
cut off. The remaining number of Eigenvalues define the size of
the output.
Okay, then maybe tica.dimension()
should return the actual number of dimensions it will project on. Maybe even add some logging.info
if you want saying that the number of dimensions were reduced to X.
Am 16/12/15 um 16:34 schrieb Stefan:
Okay, then maybe |tica.dimension()| should return the actual number of dimensions it will project on.
Yes! If that is not the case, it is a bug.
Maybe even add some |logging.info| if you want saying that the number of dimensions will be reduced.
— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/653#issuecomment-165145489.
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354 Web: research.franknoe.de
And perhaps add the role of epsilon to the docs . @stefdoerr , just out curiosity, can you call transform with an epsilon close to zero, say 1e-17?
@franknoe It's a bug then:
In [15]: tica = TICA(20, 595)
In [16]: tica.dimension()
Out[16]: 595
In [17]: dataticalist = tica.fit_transform(data.dat.tolist())
calculate covariances: 100% (2130/2130) [#############################################################################################################################] eta 00:01 |
In [18]: tica.dimension()
Out[18]: 595
In [19]: np.shape(dataticalist)
Out[19]: (2130, 500, 528)
@gph82 doesn't seem to help
In [20]: tica = TICA(20, 595, epsilon=1E-17)
In [21]: tica.dimension()
Out[21]: 595
In [22]: dataticalist = tica.fit_transform(data.dat.tolist())
calculate covariances: 100% (2130/2130) [#############################################################################################################################] eta 00:01 |
In [23]: tica.dimension()
Out[23]: 595
In [24]: np.shape(dataticalist)
Out[24]: (2130, 500, 528)
@stefdoerr , I mean when calling transform...
In [25]: dataticalist = tica.fit_transform(data.dat.tolist(), epsilon=1E-17)
calculate covariances: 100% (2130/2130) [#############################################################################################################################] eta 00:01 -
In [26]: np.shape(dataticalist)
Out[26]: (2130, 500, 528)
Why would that make a difference if I pass it in the constructor or transform.
I thought it would...sorry, I got that wrong...we'll look into this, thanks!
You can also call it with 0, but that won't retain all dimensions either, because numerically the eigenvalues of an SPD matrix can be negative, and the largest norm of a negative eigenvalue will determine the minimum epsilon being used.
Am 16/12/15 um 16:38 schrieb Guillermo Pérez-Hernández:
And perhaps add the role of epsilon to the docs . @stefdoerr https://github.com/stefdoerr , just out curiosity, can you call transform with an epsilon close to zero, say 1e-17?
— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/653#issuecomment-165147278.
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354 Web: research.franknoe.de
I called it with 0 but I got an error for NaN/Infs
@fabian-paul should have a look at it and the output dimension. I'm under water til friday.
Am 16/12/15 um 16:56 schrieb Stefan:
I called it with 0 but I got an error for NaN/Infs
— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/653#issuecomment-165153731.
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354 Web: research.franknoe.de
Ok, let's see what I can do. First of all tica.dimension() must be fixed. When the correlation matrix doesn't have full rank, there is no inverse. But you can compute the pseudo-inverse. @stefdoerr Would this help for computing the reconstruction error? Ah, and the inverse of the eigenvector matrix doesn't equal its transpose. TICA is not an orthogonal coordinate transform.
The wrong dimension is coming from this part of the code https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/coordinates/transform/tica.py#L168
d = None
if self._dim != -1: # fixed parametrization
d = self._dim
elif self._parametrized: # parametrization finished. Dimension is known
dim = len(self._eigenvalues)
if self._var_cutoff < 1.0: # if subspace_variance, reduce the output dimension if needed
dim = min(dim, np.searchsorted(self._cumvar, self._var_cutoff)+1)
d = dim
elif self._var_cutoff == 1.0: # We only know that all dimensions are wanted, so return input dim
d = self.data_producer.dimension()
else: # We know nothing. Give up
raise RuntimeError('Requested dimension, but the dimension depends on the cumulative variance and the '
'transformer has not yet been parametrized. Call parametrize() before.')
Does anybody know why we have the first branch in that if clause at all? What is the point of having a fixed parametrization when it can contradict the data? Why is this needed?
I think that we should at least change the first line to
if self._dim != -1 and not self._parametrized: ...
You're right. It should be checked if the parametrization has been done prior blindly returning the fixed dimension.
@fabian-paul Yes thanks! Didn't know the TICA transform was not orthogonal. The pseudo-inverse worked great. Was wondering why the reconstruction costs were so huge before, hehe.
Say I want for some reason to calculate the reconstruction error of TICA. The TICA class does not have a inverse_transform method implemented. Looking at
_transform_array
this would be the correct inverse transform, right?