markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
310 stars 119 forks source link

stationary probability of PCCA states #996

Closed stefdoerr closed 7 years ago

stefdoerr commented 7 years ago

If I ask for lots of macrostates (7 in this case) from PCCA I sometimes get less macrostates back which is ok.

In [40]: set(model.msm.metastable_assignments)
Out[40]: {0, 1, 5, 6}

However if I then look at the metastable_memberships it still has the original shape.

In [41]: model.msm.metastable_memberships.shape
Out[41]: (1195, 7)

If I want to calculate the stationary probability of the PCCA macrostates I weigh the stationary_distribution by the metastable_memberships like this:

macronum = 4
macroindexes = list(set(self.msm.metastable_assignments))
for i in range(macronum):
    macroeq[i] = np.sum(msm.metastable_memberships[:, macroindexes[i]] * msm.stationary_distribution)

This however doesn't sum to 1 anymore (it does if there are no empty macrostates).

In [14]: np.sum(macroeq)
Out[14]: 0.13138681256306534

I don't understand why you keep macrostates which are empty. Also if you do keep them how can I correctly calculate the PCCA stationary probability?

franknoe commented 7 years ago

That sounds like a bug. Can you provide the transition matrix for which this happens?

Am 29/11/16 um 17:49 schrieb Stefan Doerr:

If I ask for lots of macrostates (7 in this case) from PCCA I sometimes get less macrostates back which is ok.

In [40]:set(model.msm.metastable_assignments) Out[40]: {0,1,5,6}

However if I then look at the metastable_memberships it still has the original shape.

In [41]: model.msm.metastable_memberships.shape Out[41]: (1195,7)

If I want to calculate the stationary probability of the PCCA macrostates I weigh the |stationary_distribution| by the |metastable_memberships| like this:

macronum= 4 macroindexes= list(set(self.msm.metastable_assignments)) for iin range(macronum): macroeq[i]= np.sum(msm.metastable_memberships[:, macroindexes[i]]* msm.stationary_distribution)

This however doesn't sum to 1 anymore (it does if there are no empty macrostates).

In [14]: np.sum(macroeq) Out[14]:0.13138681256306534

I don't understand why you keep macrostates which are empty. Also if you do keep them how can I correctly calculate the PCCA stationary probability?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQjKY9BmlQ0VMrdTZ-uBG6uw33Y-wks5rDFeagaJpZM4K_NOP.

--


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

stefdoerr commented 7 years ago

tm.zip

marscher commented 7 years ago

you could access the stationary vector of the coarse grained transition matrix like this:

pcca = msm.pcca(m)
pcca.coarse_grained_stationary_probability

I do not know what would be affected if we reshape the memberships to its argmax, as this would also imply that we enforce a crisp clustering.

stefdoerr commented 7 years ago

Same here though. There are only theoretically 7 macrostates but in practice there are 5. Should I renormalize the probabilities? How do we proceed when we get less macrostates than asked?

In [15]: macroindexes = list(set(msm.metastable_assignments))
In [16]: np.sum(pcca.coarse_grained_stationary_probability[macroindexes])
Out[16]: 0.96595443302704598
franknoe commented 7 years ago

I think this is a bug, someone needs to review the PCCA code and treat this case. I remember that less macrostates are provided if the matrix doesn't allow as many macrostates as asked, but I don't exactly remember what the conditions for this were and I don't yet know what Stefan's case is about. The fact that later matrices are created with more states than are available is clearly wrong, so the handshake between the MSM estimator and the PCCA decomposition seems to have a problem.

Am 30/11/16 um 14:17 schrieb Stefan Doerr:

Same here though. There are only theoretically 7 macrostates but in practice there are 5. Should I renormalize the probabilities? How do we proceed when we get less macrostates than asked?

In [15]: macroindexes= list(set(msm.metastable_assignments)) In [16]: np.sum(pcca.coarse_grained_stationary_probability[macroindexes]) Out[16]:0.96595443302704598

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-263870934, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQtJ-Sty9lQr8EFWcjyurQIykp4qMks5rDXd5gaJpZM4K_NOP.

--


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

stefdoerr commented 7 years ago

FYI: We made an issue on it a year ago https://github.com/markovmodel/PyEMMA/issues/564

stefdoerr commented 7 years ago

Any news on this?

franknoe commented 7 years ago

No sorry, ERC and Revision stress.

Am 09/01/17 um 15:46 schrieb Stefan Doerr:

Any news on this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-271302047, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQnnfZn25uzpvRdRwjBAYaBjaJyD1ks5rQkhXgaJpZM4K_NOP.

--


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

j3mdamas commented 7 years ago

Hey @franknoe, any news on this issue?

j3mdamas commented 7 years ago

@franknoe is there any news here.

franknoe commented 7 years ago

Hi, I have too many things going on to do this myself. @fnueske , you understand PCCA++ well, would you be so kind to have a quick look?

fnueske commented 7 years ago

Sure, I'll try to understand what's going on. Please give me a few days.

j3mdamas commented 7 years ago

Thanks a lot @fnueske! If you need any help, Stefan and I can assist, special @stefdoerr, who's more into the problem

franknoe commented 7 years ago

Thank you!

Am 09/05/17 um 22:25 schrieb Feliks Nüske:

Sure, I'll try to understand what's going on. Please give me a few days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-300290814, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQva_hR1jbx5spqJX8mMER9yMvc-nks5r4MvSgaJpZM4K_NOP.

--


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

fnueske commented 7 years ago

At first sight it seems this is not a real bug, just a weird outcome of the calculation: pcca does compute seven macrostates, as you have observed:

`T = np.load("Tm.npy") M = pmsm.markov_model(T) M.pcca(7) chi = M.metastable_memberships print chi.shape

(1195, 7)`

But none of the microstates is assigned to states 2, 3 or 4: np.unique(M.metastable_assignments) array([0, 1, 5, 6])

The metastable assignment just takes the argmax of M.metastable_membership along each row. If there are very fuzzy pcca sets, I guess this can happen. In this example, many of the microstates have a maximal degree of membership to any of the macrostates below 0.6. But I wonder if this is more of a problem of the model and not so much a bug in the code. Should we issue a warning in this case? What do you all think about the problem?

franknoe commented 7 years ago

I see. Yes, this is not a bug. It could occur if there is no spectral gap after 7 states, and indicates that 7 states is just not a good choice in this case. HMMs or other coarse-graining approaches might be able to do better, but that's a different issue. I agree that issuing a warning in this case would be useful, perhaps after the pcca call.

Am 24/05/17 um 17:36 schrieb Feliks Nüske:

At first sight it seems this is not a real bug, just a weird outcome of the calculation: pcca does compute seven macrostates, as you have observed:

`T = np.load("Tm.npy") M = pmsm.markov_model(T) M.pcca(7) chi = M.metastable_memberships print chi.shape

(1195, 7)`

But none of the microstates is assigned to states 2, 3 or 4: |np.unique(M.metastable_assignments) array([0, 1, 5, 6])|

The metastable assignment just takes the argmax of M.metastable_membership along each row. If there are very fuzzy pcca sets, I guess this can happen. In this example, many of the microstates have a maximal degree of membership to any of the macrostates below 0.6. But I wonder if this is more of a problem of the model and not so much a bug in the code. Should we issue a warning in this case? What do you all think about the problem?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-303762646, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQm1VGfjxBanwSADPoEnRj3oUzzlqks5r9E57gaJpZM4K_NOP.

--


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

stefdoerr commented 7 years ago

I agree it can produce less states but what about the stationary probabilities? It's a problem if they don't sum to 1 no?

franknoe commented 7 years ago

Yes. Feliks, can you look into that?

Am 26/05/17 um 02:26 schrieb Stefan Doerr:

I agree it can produce less states but what about the stationary probabilities? It's a problem if they don't sum to 1 no?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-304159077, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQtKDIWY5c-6W6rAycLCbywbBgdzDks5r9hwYgaJpZM4K_NOP.

--


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

fnueske commented 7 years ago

@stefdoerr : In your code, you are only summing up the stationary probabilities of those four macrostates to which microstates are assigned. But there are significant non-zero memberships to the other states as well. If you invoke (same variables as above)

pi_c = np.dot(M.stationary_distribution, chi)
print pi_c.sum()
1.0

all memberships are taken into account and the result is correct. In this example, the crisp assignment of microstates to macro states is really misleading as the decomposition is very fuzzy. As Frank said, 7 pcca states is probably not a good choice here. Does all of that make sense?

stefdoerr commented 7 years ago

Ah I see. So it's an artifact of the crisp assignmen that some macros are left without micros. I will have a look at it when I'm back.

stefdoerr commented 7 years ago

Ok, so I decided to essentially just decrease the number of macrostates until pcca returns the correct amount. I think this is the best solution from a user perspective:

modelflag = False
while not modelflag:
    self.coarsemsm = self.msm.pcca(macronum)
    if len(np.unique(self.msm.metastable_assignments)) != macronum:
        macronum -= 1
        print('PCCA returned empty macrostates. Reducing the number of macrostates to {}'.format(macronum))
    else:
        modelflag = True

For me, we can close this issue. I don't know if you want to throw a warning in such cases. Up to you.

stefdoerr commented 7 years ago

Thanks also for all the help @fnueske !