Closed stefdoerr closed 7 years ago
That sounds like a bug. Can you provide the transition matrix for which this happens?
Am 29/11/16 um 17:49 schrieb Stefan Doerr:
If I ask for lots of macrostates (7 in this case) from PCCA I sometimes get less macrostates back which is ok.
In [40]:set(model.msm.metastable_assignments) Out[40]: {0,1,5,6}
However if I then look at the metastable_memberships it still has the original shape.
In [41]: model.msm.metastable_memberships.shape Out[41]: (1195,7)
If I want to calculate the stationary probability of the PCCA macrostates I weigh the |stationary_distribution| by the |metastable_memberships| like this:
macronum= 4 macroindexes= list(set(self.msm.metastable_assignments)) for iin range(macronum): macroeq[i]= np.sum(msm.metastable_memberships[:, macroindexes[i]]* msm.stationary_distribution)
This however doesn't sum to 1 anymore (it does if there are no empty macrostates).
In [14]: np.sum(macroeq) Out[14]:0.13138681256306534
I don't understand why you keep macrostates which are empty. Also if you do keep them how can I correctly calculate the PCCA stationary probability?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQjKY9BmlQ0VMrdTZ-uBG6uw33Y-wks5rDFeagaJpZM4K_NOP.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
you could access the stationary vector of the coarse grained transition matrix like this:
pcca = msm.pcca(m)
pcca.coarse_grained_stationary_probability
I do not know what would be affected if we reshape the memberships to its argmax, as this would also imply that we enforce a crisp clustering.
Same here though. There are only theoretically 7 macrostates but in practice there are 5. Should I renormalize the probabilities? How do we proceed when we get less macrostates than asked?
In [15]: macroindexes = list(set(msm.metastable_assignments))
In [16]: np.sum(pcca.coarse_grained_stationary_probability[macroindexes])
Out[16]: 0.96595443302704598
I think this is a bug, someone needs to review the PCCA code and treat this case. I remember that less macrostates are provided if the matrix doesn't allow as many macrostates as asked, but I don't exactly remember what the conditions for this were and I don't yet know what Stefan's case is about. The fact that later matrices are created with more states than are available is clearly wrong, so the handshake between the MSM estimator and the PCCA decomposition seems to have a problem.
Am 30/11/16 um 14:17 schrieb Stefan Doerr:
Same here though. There are only theoretically 7 macrostates but in practice there are 5. Should I renormalize the probabilities? How do we proceed when we get less macrostates than asked?
In [15]: macroindexes= list(set(msm.metastable_assignments)) In [16]: np.sum(pcca.coarse_grained_stationary_probability[macroindexes]) Out[16]:0.96595443302704598
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-263870934, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQtJ-Sty9lQr8EFWcjyurQIykp4qMks5rDXd5gaJpZM4K_NOP.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
FYI: We made an issue on it a year ago https://github.com/markovmodel/PyEMMA/issues/564
Any news on this?
No sorry, ERC and Revision stress.
Am 09/01/17 um 15:46 schrieb Stefan Doerr:
Any news on this?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-271302047, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQnnfZn25uzpvRdRwjBAYaBjaJyD1ks5rQkhXgaJpZM4K_NOP.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
Hey @franknoe, any news on this issue?
@franknoe is there any news here.
Hi, I have too many things going on to do this myself. @fnueske , you understand PCCA++ well, would you be so kind to have a quick look?
Sure, I'll try to understand what's going on. Please give me a few days.
Thanks a lot @fnueske! If you need any help, Stefan and I can assist, special @stefdoerr, who's more into the problem
Thank you!
Am 09/05/17 um 22:25 schrieb Feliks Nüske:
Sure, I'll try to understand what's going on. Please give me a few days.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-300290814, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQva_hR1jbx5spqJX8mMER9yMvc-nks5r4MvSgaJpZM4K_NOP.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
At first sight it seems this is not a real bug, just a weird outcome of the calculation: pcca does compute seven macrostates, as you have observed:
`T = np.load("Tm.npy") M = pmsm.markov_model(T) M.pcca(7) chi = M.metastable_memberships print chi.shape
(1195, 7)`
But none of the microstates is assigned to states 2, 3 or 4:
np.unique(M.metastable_assignments) array([0, 1, 5, 6])
The metastable assignment just takes the argmax of M.metastable_membership along each row. If there are very fuzzy pcca sets, I guess this can happen. In this example, many of the microstates have a maximal degree of membership to any of the macrostates below 0.6. But I wonder if this is more of a problem of the model and not so much a bug in the code. Should we issue a warning in this case? What do you all think about the problem?
I see. Yes, this is not a bug. It could occur if there is no spectral gap after 7 states, and indicates that 7 states is just not a good choice in this case. HMMs or other coarse-graining approaches might be able to do better, but that's a different issue. I agree that issuing a warning in this case would be useful, perhaps after the pcca call.
Am 24/05/17 um 17:36 schrieb Feliks Nüske:
At first sight it seems this is not a real bug, just a weird outcome of the calculation: pcca does compute seven macrostates, as you have observed:
`T = np.load("Tm.npy") M = pmsm.markov_model(T) M.pcca(7) chi = M.metastable_memberships print chi.shape
(1195, 7)`
But none of the microstates is assigned to states 2, 3 or 4: |np.unique(M.metastable_assignments) array([0, 1, 5, 6])|
The metastable assignment just takes the argmax of M.metastable_membership along each row. If there are very fuzzy pcca sets, I guess this can happen. In this example, many of the microstates have a maximal degree of membership to any of the macrostates below 0.6. But I wonder if this is more of a problem of the model and not so much a bug in the code. Should we issue a warning in this case? What do you all think about the problem?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-303762646, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQm1VGfjxBanwSADPoEnRj3oUzzlqks5r9E57gaJpZM4K_NOP.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
I agree it can produce less states but what about the stationary probabilities? It's a problem if they don't sum to 1 no?
Yes. Feliks, can you look into that?
Am 26/05/17 um 02:26 schrieb Stefan Doerr:
I agree it can produce less states but what about the stationary probabilities? It's a problem if they don't sum to 1 no?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/996#issuecomment-304159077, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQtKDIWY5c-6W6rAycLCbywbBgdzDks5r9hwYgaJpZM4K_NOP.
--
Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin
@stefdoerr : In your code, you are only summing up the stationary probabilities of those four macrostates to which microstates are assigned. But there are significant non-zero memberships to the other states as well. If you invoke (same variables as above)
pi_c = np.dot(M.stationary_distribution, chi)
print pi_c.sum()
1.0
all memberships are taken into account and the result is correct. In this example, the crisp assignment of microstates to macro states is really misleading as the decomposition is very fuzzy. As Frank said, 7 pcca states is probably not a good choice here. Does all of that make sense?
Ah I see. So it's an artifact of the crisp assignmen that some macros are left without micros. I will have a look at it when I'm back.
Ok, so I decided to essentially just decrease the number of macrostates until pcca returns the correct amount. I think this is the best solution from a user perspective:
modelflag = False
while not modelflag:
self.coarsemsm = self.msm.pcca(macronum)
if len(np.unique(self.msm.metastable_assignments)) != macronum:
macronum -= 1
print('PCCA returned empty macrostates. Reducing the number of macrostates to {}'.format(macronum))
else:
modelflag = True
For me, we can close this issue. I don't know if you want to throw a warning in such cases. Up to you.
Thanks also for all the help @fnueske !
If I ask for lots of macrostates (7 in this case) from PCCA I sometimes get less macrostates back which is ok.
However if I then look at the metastable_memberships it still has the original shape.
If I want to calculate the stationary probability of the PCCA macrostates I weigh the
stationary_distribution
by themetastable_memberships
like this:This however doesn't sum to 1 anymore (it does if there are no empty macrostates).
I don't understand why you keep macrostates which are empty. Also if you do keep them how can I correctly calculate the PCCA stationary probability?