Closed rmcgibbo closed 8 years ago
I suppose this could be beneficial. We main just need to carefully define what we mean by "project".
In some of my code, I was actually doing this, which would not be necessary if we had mean-subtracted projection.
x = tica.project(trj, which=arange(5))
x = sklearn.preprocessing.StandardScaler().fit_transform(x)
I accidently hit enter before I finished my comment. See above for point 2.
IMHO we shouldn't be writing our own PCA solver.
Agreed.
I suppose we should get Christian's buyin before we axe his PCA code, but my vote is to just force users to input a positive lagtime and get rid of all the extra branches.
Actually, my test case is subject to the difference in project. But the point is to say that on line 208, we're incrementing sum_t_dt
by sum_t
, when I'm pretty sure that's not the right behavior. They should be equal with lag==0
tica = tICA(lag=0)
for i in range(10):
print 'iteration', i
tica.train(prep_trajectory=X1)
np.testing.assert_array_equal(tica.sum_t, tica.sum_t_dt)
$ python test.py
08:01:52 - no metric specified, you must pass prepared trajectories to the train and project methods
iteration 0
iteration 1
Traceback (most recent call last):
File "test.py", line 13, in <module>
np.testing.assert_array_equal(tica.sum_t, tica.sum_t_dt)
File "/Users/rmcgibbo/miniconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 718, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
File "/Users/rmcgibbo/miniconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 644, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
(mismatch 100.0%)
x: array([ 3.97145422, -11.23479935, 0.31692569])
y: array([ 5.95718132, -16.85219903, 0.47538853])
I don't really care about the PCA code. I think I added it because the tICA code was lonely in the "reduce" folder.
As for mean subtracting, let's just do what sklearn does in PCA for consistency. Since we were only using tICA to calculate distances, it never mattered if we subtracted the mean.
On Sun, Mar 23, 2014 at 8:02 AM, Robert McGibbon notifications@github.comwrote:
Actually, my test case is subject to the difference in project. But the point is to say that on line 208, we're incrementing sum_t_dt by sum_t, when I'm pretty sure that's not the right behavior. They should be equal with lag=0
tica = tICA(lag=0) for i in range(10): print 'iteration', i tica.train(prep_trajectory=X1) np.testing.assert_array_equal(tica.sum_t, tica.sum_t_dt)
$ python test.py 08:01:52 - no metric specified, you must pass prepared trajectories to the train and project methods iteration 0 iteration 1 Traceback (most recent call last): File "test.py", line 13, in
np.testing.assert_array_equal(tica.sum_t, tica.sum_t_dt) File "/Users/rmcgibbo/miniconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 718, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/Users/rmcgibbo/miniconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 644, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 100.0%) x: array([ 3.97145422, -11.23479935, 0.31692569]) y: array([ 5.95718132, -16.85219903, 0.47538853])
Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/387#issuecomment-38384867 .
FWIW, I noticed both of those issues while reimplementing the method, since I wanted sklearn compatible method names (fit
, fit_transform
, etc). That code is here: https://github.com/rmcgibbo/projector/blob/master/projector/models/tica.py
Don't we want sklearn compatible method names here as well?
(I think the answer is yes.)
But obviously there's no rush, I understand it might be easier to push things externally and later backport to MSMB.
Actually, my test case is subject to the difference in project. But the point is to say that on line 208, we're incrementing sum_t_dt by sum_t, when I'm pretty sure that's not the right behavior. They should be equal with lag==0
I fixed this one at least
(1) Should tICA.predict subtract out the mean before dotting the timeseries into the tICs? This is, for example, how PCA works in sklearn
(2) I think there's a bug for the PCA case on line 208. Here's the test case