connected sets and TRAM estimator

markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂

http://pyemma.org

GNU Lesser General Public License v3.0

306 stars 119 forks source link

connected sets and TRAM estimator #1052

Closed vvoelz closed 6 years ago

vvoelz commented 7 years ago

I've been trying to build TRAM models for a "toy" 9-state landscape from Monte Carlo trajectories that changes with pH. The idea is that we could get lots of productive sampling of transitions at low pH, with hopes of extrapolating slower rates at higher pH.

Here's a picture of a situation where we're just using two pH values -- pH 3.6 is the "biased" landscape, and pH 4.0 is "unbiased".

toy_apomb_landscape

For each pH, I sample the landscape with long trajectories, all starting from state index 8 (on the far right). As you can see, in this model the transition-state (TS) barriers get their own state, so trajectories that visit these states immediately leave the state in the next time step. The result is that states 5 through 8 get ergodically trimmed from the model.

Next, I tried adding an artificial self-count for each time a TS is visited; same problem.

Finally, after adding artificial self-, left- and right- counts, I got the TRAM estimator to stop trimming my states.

I'm pretty sure this is because the default connected set determination is for directed counts. Perhaps we could give users like myself more access to how the connected set is defined? I think it's important because there are so many applications where only "one way" transitions are observed (in this case 4->5 transitions will be exceeding rare/absent), yet we want to keep these states in the model.

Edit: Also, is there any way to turn off trimming to connected sets altogether?

--Vince

cc'ed: @cwehmeyer @franknoe @juanmit

franknoe commented 7 years ago

Hi Vince,

the technical side is better replied by Chris or Fabian. However, are you sure you have transitions towards states above 5 occur in one of the two ensembles? If states are ergodically connected in one of the two ensembles, then simply connectivity in the other ensemble should be enough.

Another option is that you might not have enough connectivity between ensembles in terms of thermodynamic overlap, such that coupling the two ensembles didn't help. Fabian can comment on that.

cheers - Frank.

Am 22/02/17 um 19:07 schrieb Vincent Voelz:

I've been trying to build TRAM models for a "toy" 9-state landscape from Monte Carlo trajectories that changes with pH. The idea is that we could get lots of productive sampling of transitions at low pH, with hopes of extrapolating slower rates at higher pH.

Here's a picture of a situation where we're just using /two/ pH values -- pH 3.6 is the "biased" landscape, and pH 4.0 is "unbiased".

toy_apomb_landscape https://cloud.githubusercontent.com/assets/123431/23224534/c79c0c66-f8fc-11e6-85e5-3d11bf41341f.png

For each pH, I sample the landscape with long trajectories, all starting from state index 8 (on the far right). As you can see, in this model the transition-state (TS) barriers get their own state, so trajectories that visit these states immediately leave the state in the next time step. The result is that all states 5 and above get ergodically trimmed

Next, I tried adding an artificial self-count for each time a TS is visited; same problem.

Finally, after adding artificial self-, left- and right- counts, I finally got the TRAM estimator to stop trimming my states.

I'm pretty sure this is because the default connected set determination is for /directed/ counts. Perhaps we could give users like myself more access to how the connected set is defined? I think it's important because there are so many applications where only "one way" transitions are observed (in this case 4->5 transitions will be exceeding rare/absent), yet we want to keep these states in the model.

--Vince

cc'ed: @cwehmeyer https://github.com/cwehmeyer @franknoe https://github.com/franknoe @juanmit https://github.com/juanmit

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1052, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQql0lX9IuuFOQrIXIDGQDr61k1qyks5rfHlOgaJpZM4MI_fM.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe commented 7 years ago

@wehmeyer or @fabian-paul, can you help here?

Am 22/02/17 um 19:07 schrieb Vincent Voelz:

I've been trying to build TRAM models for a "toy" 9-state landscape from Monte Carlo trajectories that changes with pH. The idea is that we could get lots of productive sampling of transitions at low pH, with hopes of extrapolating slower rates at higher pH.

Here's a picture of a situation where we're just using /two/ pH values -- pH 3.6 is the "biased" landscape, and pH 4.0 is "unbiased".

toy_apomb_landscape https://cloud.githubusercontent.com/assets/123431/23224534/c79c0c66-f8fc-11e6-85e5-3d11bf41341f.png

For each pH, I sample the landscape with long trajectories, all starting from state index 8 (on the far right). As you can see, in this model the transition-state (TS) barriers get their own state, so trajectories that visit these states immediately leave the state in the next time step. The result is that all states 5 and above get ergodically trimmed

Next, I tried adding an artificial self-count for each time a TS is visited; same problem.

Finally, after adding artificial self-, left- and right- counts, I finally got the TRAM estimator to stop trimming my states.

I'm pretty sure this is because the default connected set determination is for /directed/ counts. Perhaps we could give users like myself more access to how the connected set is defined? I think it's important because there are so many applications where only "one way" transitions are observed (in this case 4->5 transitions will be exceeding rare/absent), yet we want to keep these states in the model.

--Vince

cc'ed: @cwehmeyer https://github.com/cwehmeyer @franknoe https://github.com/franknoe @juanmit https://github.com/juanmit

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1052, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQql0lX9IuuFOQrIXIDGQDr61k1qyks5rfHlOgaJpZM4MI_fM.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

vvoelz commented 7 years ago

@franknoe thanks for the follow-up. No need for intense consternation over my results with this toy model; I'm asking a lot of it (!) and expect it to fail; more trying to figure out what TRAM is giving as output.

However, are you sure you have transitions towards states above 5 occur in one of the two ensembles? If states are ergodically connected in one of the two ensembles, then simply connectivity in the other ensemble should be enough.

I checked this out, and you are correct -- if I add some 4->5 backcounts, and nothing else, I do get a connected model, at least at low pH.

Another option is that you might not have enough connectivity between ensembles in terms of thermodynamic overlap, such that coupling the two ensembles didn't help. Fabian can comment on that.

Yes, this is going on too: at high pH, ensembles are trimmed to states 6,7,8. I assume by "thermodynamic overlap" do you mean a situation where the MBAR-like part of the TRAM estimator essentially predicts zero equilibrium population (i.e. free energies of -np.infty)

tram_obj.therm_energies [ -1.76382694e+00   2.86109576e-05   7.94309680e-01   1.15270378e+00
   1.32274495e+00   1.39688315e+00   1.42616031e+00]
[[  8.27148785e+00   4.75106770e+00   6.50519828e+00   1.45518931e+00
    7.56781137e+00   2.77260300e+00   7.99786410e+00  -1.47796608e+00
    5.96076943e+00]
 [  6.55730204e+00   3.61890133e+00   6.10350446e+00   2.37165166e+00
    8.05997044e+00  -1.97423549e+04   7.69459849e+00   2.98292411e+00
    8.13733817e+00]
 [            -inf             -inf             -inf             -inf
              -inf             -inf   7.56988395e+00   2.23797093e+00
    8.99351562e+00]
 [            -inf             -inf             -inf             -inf
              -inf             -inf   6.12308386e+00   1.49961593e+00
    9.16306535e+00]
 [            -inf             -inf             -inf             -inf
              -inf             -inf   6.62110403e+00   1.64326568e+00
    9.13162768e+00]
 [            -inf             -inf             -inf             -inf
              -inf             -inf             -inf             -inf
    9.21024037e+00]
 [            -inf             -inf             -inf             -inf
              -inf             -inf             -inf             -inf
    9.21024037e+00]]

franknoe commented 7 years ago

Not sure about the second question, this is better answered by @cwehmeyer or @fabian-paul.

Am 28/02/17 um 21:52 schrieb Vincent Voelz:

@franknoe https://github.com/franknoe thanks for the follow-up. No need for intense constrenation over my results with this toy model; I'm asking a lot of it (!) and expect it to fail; more trying to figure out what TRAM is giving as output.
However, are you sure you have transitions towards states above 5
occur in one of the
two ensembles? If states are ergodically connected in one of the two
ensembles, then simply connectivity in the other ensemble should
be enough.
I checked this out, and you are correct -- if I add some 4->5 backcounts, and nothing else, I do get a connected model, at least at low pH.
Another option is that you might not have enough connectivity between
ensembles in terms of thermodynamic overlap, such that coupling
the two
ensembles didn't help. Fabian can comment on that.
Yes, this is going on too: at high pH, ensembles are trimmed to states 6,7,8. I assume by "thermodynamic overlap" do you mean a situation where the MBAR-like part of the TRAM estimator essentially predicts zero equilibrium population (i.e. free energies of -np.infty)

|tram_obj.therm_energies [ -1.76382694e+00 2.86109576e-05 7.94309680e-01 1.15270378e+00 1.32274495e+00 1.39688315e+00 1.42616031e+00] [[ 8.27148785e+00 4.75106770e+00 6.50519828e+00 1.45518931e+00 7.56781137e+00 2.77260300e+00 7.99786410e+00 -1.47796608e+00 5.96076943e+00] [ 6.55730204e+00 3.61890133e+00 6.10350446e+00 2.37165166e+00 8.05997044e+00 -1.97423549e+04 7.69459849e+00 2.98292411e+00 8.13733817e+00] [ -inf -inf -inf -inf -inf -inf 7.56988395e+00 2.23797093e+00 8.99351562e+00] [ -inf -inf -inf -inf -inf -inf 6.12308386e+00 1.49961593e+00 9.16306535e+00] [ -inf -inf -inf -inf -inf -inf 6.62110403e+00 1.64326568e+00 9.13162768e+00] [ -inf -inf -inf -inf -inf -inf -inf -inf 9.21024037e+00] [ -inf -inf -inf -inf -inf -inf -inf -inf 9.21024037e+00]] |

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1052#issuecomment-283157026, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQtwoUpd3KOD_VwXkzTJ936M-P9Zcks5rhIkqgaJpZM4MI_fM.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

vvoelz commented 7 years ago

p.s. I can post an example of my code and input files if it's helpful. I'm certain I can "fix" my issues with better sampling, however.

franknoe commented 7 years ago

of course

Am 28/02/17 um 21:57 schrieb Vincent Voelz:

p.s. I can post an example of my code and input files if it's helpful. I'm certain I can "fix" my issues with better sampling, however.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1052#issuecomment-283158134, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQuT2Cwbe24VSRpjsDMyflgfWlYMGks5rhIpFgaJpZM4MI_fM.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 7 years ago

Hi Vincent,

I think it would help a lot to see what's going on, if you could post the count matrices.

First of all, it is not possible to not trim the set of states to the ergodically connected set. (At least with the current TRAM theory.) If the set of states is not trimmed and there is an absorbing state (or a set of absorbing states), that state will get a probability of one and the remaining states a probability of zero. To arrive at a solution where one or more probabilites are exactly zero, TRAM might take a very long time to converge (days to weeks) so we opted for trimming the set of states by default.

To get a grasp of what the "ergodically connected set" means in TRAM, I've shared a few slides from the presentation I gave last week https://gist.github.com/fabian-paul/9565bc0fcb0f225e60c969929a320a3e One should think of the TRAM states as states in the product space of Markov-states (clusters) and thermodynamic states (ensembles). So you can have the situation, where a given Markov state is in the connected set in ensemble A but at the same time is not in the connected set in ensemble B. To be in the connected set of TRAM states, a state must be reachable from any other state in the connected set. If you look at image TRAM-overlap-2.png in the Gist, one can reach all the red TRAM states by following a red transition arrow or by following a green line (which indicates local overlap of the distributions).

There a different choices for quantifying the overlap of distributions between different ensembles. These options a fully documented in thermotools, see https://github.com/markovmodel/thermotools/blob/devel/thermotools/cset.py#L35 And we are currently working on improving the TRAM documentation. The default setting for determining overlap between ensembles in TRAM is connectivity='summed_count_matrix' which means that the overlap is not checked and it is assumed that the distributions overlap. A more conservative choice would be connectivity='post_hoc_RE' which is explained in the thermotools documentation and on slide TRAM-overlap-1.png in the Gist.

Hope that helps, Fabian

fabian-paul commented 7 years ago

@vvoelz : I might have made a mistake in the explanation / definition of the connected set for TRAM. Please let me check this again.

fabian-paul commented 7 years ago

Hi all,

I'm quite sure now that I made a subtle mistake in the definition of connectivity. Thanks to @giopina for bringing it to my attention.

Initially I though that the following sampling situation (A) would be a valid input for TRAM:

E1:  0   -> 1  -> 2

E0:  0  <-  1  <- 2

E0 and E1 are two ensembles. The unidirectional arrows denote unidirectional transitions between different Markov states (0,1,2). Unidirectional means that the reverse direction was not sampled in the data.

What might actually be needed is the following situation (B)

E1:  0  <-> 1  -> 2           

E0:  0  <-  1 <-> 2

with more reversible transitions between the states. Here the reversible transitions (<->) form a chain that connects all Markov states.

It can be seen that there is some problem with (A), because the TRAM estimators don't converge (or take extremely long to converge and the thermodynamic estimates come out wrong). I think this can also be rationalized easily by looking that the following toy example:

E1:  0  -> 1            

E0:  0  <- 1

In this situation one can easily compute T¹₀₁ and T⁰₁₀. We also know the per-Markov-state reweighting factors g₀ and g₁ that relate the stationary distributions of the two ensembles:

pi¹₀ = g₀pi⁰₀

pi¹₁ = g₁pi⁰₁

Furthermore we know the detailed balance relations that relate the transition probabilities: pi⁰₀ T⁰₀₁ = pi⁰₁ T⁰₁₀

pi¹₀ T¹₀₁ = pi¹₁ T¹₁₀

However these are still too little equations to solve for all unknowns.

Inserting these equations leads to: pi⁰₀ T⁰₀₁ = pi⁰₁ T⁰₁₀

g₀pi⁰₀ T¹₀₁ = g₁pi⁰₁ T¹₁₀

If we define r=pi⁰₀ / pi⁰₁, we can rewrite the equations as

r T⁰₀₁ = T⁰₁₀

g₀r T¹₀₁ = g₁ T¹₁₀

These are two equations for three unknowns, so it's clear that this can't be solved.

This insight leads directly to a different definition of the connected set. I have implemented the correct connectivity check here: https://github.com/markovmodel/thermotools/commit/5dfe538f770bcdd532cc256b62cd0cc759020733

I apologize for the confusion that I might have caused.

Before we go on, please let's discuss this.

Fabian

franknoe commented 7 years ago

Hm, seems unintuitive, but maybe you're right. We also have some normalization condition on the pi's (either they add up to 1 by ensemble or in total, depending on how exactly you define them). Still not enough?

Am 15/04/17 um 12:47 schrieb fabian-paul:

Hi all,

I'm quire sure now that I made a subtle mistake in the definition of connectivity. Thanks to @giopina https://github.com/giopina for bringing it to my attention.

Initially I though that the following sampling situation (A) would be a valid input for TRAM:

|E1: 0 -> 1 -> 2 E0: 0 <- 1 <- 2 |

E0 and E1 are two ensembles. The unidirectional arrows denote unidirectional transitions between different Markov states (0,1,2). Unidirectional means that the reverse direction was not sampled in the data.

What might actually be needed is the following situation (B)

|E1: 0 <-> 1 -> 2 E0: 0 <- 1 <-> 2 |

with more reversible transitions between the states. Here the reversible transitions (<->) form a chain that connects all Markov states.

It can be seen that there is some problem with (A), because the TRAM estimators don't converge (or take extremely long to converge and the thermodynamic estimates come out wrong). I think this can also be rationalized easily by looking that the following toy example:

|E1: 0 -> 1 E0: 0 <- 1 |

In this situation one can easily compute T^1 _01 and T^0 _10 . We also know the per-Markov-state reweighting factors g_0 and g_1 that relate the stationary distributions of the two ensembles:

pi^1 _0 = g_0 pi^0 _0

pi^1 _1 = g_1 pi^0 _1

Furthermore we know the detailed balance relations that relate the transition probabilities: pi^0 _0 T^0 _01 = pi^0 _1 T^0 _10

pi^1 _0 T^1 _01 = pi^1 _1 T^1 _10

However these are still too little equations to solve for all unknowns.

Inserting these equations leads to: pi^0 _0 T^0 _01 = pi^0 _1 T^0 _10

g_0 pi^0 _0 T^1 _01 = g_1 pi^0 _1 T^1 _10

If we define r=pi^0 _0 / pi^0 _1 , we can rewrite the equations as

r T^0 _01 = T^0 _10

g_0 r T^1 _01 = g_1 T^1 _10

These are two equations for three unknowns, so it's clear that this can't be solved.

This insight leads directly to a different definition of the connected set. I have implemented the correct connectivity check here: markovmodel/thermotools@5dfe538 https://github.com/markovmodel/thermotools/commit/5dfe538f770bcdd532cc256b62cd0cc759020733

I apologize for the confusion that I might have cause.

Before we go on, please let's discuss this.

Fabian

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1052#issuecomment-294285682, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQoHqF_r6U5rVYfYS1O4vicseAtlEks5rwKArgaJpZM4MI_fM.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe commented 7 years ago

PS you can try this simply example if it converges with dTRAM of course

Am 15/04/17 um 12:47 schrieb fabian-paul:

Hi all,

I'm quire sure now that I made a subtle mistake in the definition of connectivity. Thanks to @giopina https://github.com/giopina for bringing it to my attention.

Initially I though that the following sampling situation (A) would be a valid input for TRAM:

|E1: 0 -> 1 -> 2 E0: 0 <- 1 <- 2 |

E0 and E1 are two ensembles. The unidirectional arrows denote unidirectional transitions between different Markov states (0,1,2). Unidirectional means that the reverse direction was not sampled in the data.

What might actually be needed is the following situation (B)

|E1: 0 <-> 1 -> 2 E0: 0 <- 1 <-> 2 |

with more reversible transitions between the states. Here the reversible transitions (<->) form a chain that connects all Markov states.

It can be seen that there is some problem with (A), because the TRAM estimators don't converge (or take extremely long to converge and the thermodynamic estimates come out wrong). I think this can also be rationalized easily by looking that the following toy example:

|E1: 0 -> 1 E0: 0 <- 1 |

In this situation one can easily compute T^1 _01 and T^0 _10 . We also know the per-Markov-state reweighting factors g_0 and g_1 that relate the stationary distributions of the two ensembles:

pi^1 _0 = g_0 pi^0 _0

pi^1 _1 = g_1 pi^0 _1

Furthermore we know the detailed balance relations that relate the transition probabilities: pi^0 _0 T^0 _01 = pi^0 _1 T^0 _10

pi^1 _0 T^1 _01 = pi^1 _1 T^1 _10

However these are still too little equations to solve for all unknowns.

Inserting these equations leads to: pi^0 _0 T^0 _01 = pi^0 _1 T^0 _10

g_0 pi^0 _0 T^1 _01 = g_1 pi^0 _1 T^1 _10

If we define r=pi^0 _0 / pi^0 _1 , we can rewrite the equations as

r T^0 _01 = T^0 _10

g_0 r T^1 _01 = g_1 T^1 _10

These are two equations for three unknowns, so it's clear that this can't be solved.

This insight leads directly to a different definition of the connected set. I have implemented the correct connectivity check here: markovmodel/thermotools@5dfe538 https://github.com/markovmodel/thermotools/commit/5dfe538f770bcdd532cc256b62cd0cc759020733

I apologize for the confusion that I might have cause.

Before we go on, please let's discuss this.

Fabian

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1052#issuecomment-294285682, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQoHqF_r6U5rVYfYS1O4vicseAtlEks5rwKArgaJpZM4MI_fM.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 7 years ago

Well, I think there isn't any other equation that can be used. The normalization of pi and g doesn't matter, because only the ratios g₀/g₁, pi⁰₀/pi⁰₁ and pi¹₀/pi¹₁ enter.

Let's pick for example g₀=1 and g₁=2. Let the count matrix elements C⁰₁₀ > 0 and C¹₀₁ > 0 and the rest be zero. Then in the above example the following would be a valid solution

     1/3 2/3           0   1
T0 =            T1 = 
      1   0          3/4  1/4

and this as well

     1/4 3/4           0   1
T0 =            T1 = 
      1   0          2/3  1/3

Both have maximum (dTRAM) log-likelihood = 0.

I'm going to try this explicitly with dTRAM. Let me get back to you later.

fabian-paul commented 7 years ago

It can also be shown numerically with our dTRAM implementation that the result is not unique for the irreversible case introduced above. Here are the results of two dTRAM runs with different initial conditions for the optimization. The transition matrices agree with the constraints (Boltzmann factors) but are different in each run:

import numpy as np
import pyemma
import msmtools
np.set_printoptions(precision=1)

# prepare sample data for the toy example above
dtrajs = [ np.array([1,0]) ] * 10 + [ np.array([0,1]) ] * 10
ttrajs = [ np.array([0,0]) ] * 10 + [ np.array([1,1]) ] * 10
bias_energy = np.array([[0.0, 0.0], [-np.log(1.0), -np.log(2.0)]])

dtram = pyemma.thermo.DTRAM(bias_energy, 1)
dtram.estimate((ttrajs, dtrajs))

T0 = dtram.models[0].P
T1 = dtram.models[1].P
C0 = dtram.count_matrices[0,:,:]
C1 = dtram.count_matrices[1,:,:]
print 'log-likelihood', (C0[C0>0] * np.log(T0[C0>0])).sum() + (C1[C1>0]*np.log(T1[C1>0])).sum(), dtram.log_likelihood()

pi0 = msmtools.analysis.statdist(T0)
pi1 = msmtools.analysis.statdist(T1)
g0, g1 = pi1/pi0
print 'ratio of gammas', g1/g0, '(should be 2)'

print T0
print T1

log-likelihood -2.00000016549e-10 -2.00000016549e-10
ratio of gammas 2.0 (should be 2)
[[  3.0e-01   7.0e-01]
 [  1.0e+00   1.0e-11]]
[[  1.0e-11   1.0e+00]
 [  7.1e-01   2.9e-01]]

dtram = pyemma.thermo.DTRAM(bias_energy, 1)
pi_guess = np.array([10,1], dtype=np.float64)
pi_guess /= pi_guess.sum()
dtram.conf_energies =  -np.log(pi_guess) # different initial conditions
dtram.estimate((ttrajs, dtrajs))

T0 = dtram.models[0].P
T1 = dtram.models[1].P
C0 = dtram.count_matrices[0,:,:]
C1 = dtram.count_matrices[1,:,:]
print 'log-likelihood', (C0[C0>0] * np.log(T0[C0>0])).sum() + (C1[C1>0]*np.log(T1[C1>0])).sum(), dtram.log_likelihood()

pi0 = msmtools.analysis.statdist(T0)
pi1 = msmtools.analysis.statdist(T1)
g0, g1 = pi1/pi0
print 'ratio of gammas', g1/g0, '(should be 2)'

print T0
print T1

log-likelihood -2.00000016549e-10 -2.00000016549e-10
ratio of gammas 1.99999999999 (should be 2)
[[  2.7e-01   7.3e-01]
 [  1.0e+00   1.0e-11]]
[[  1.0e-11   1.0e+00]
 [  6.9e-01   3.1e-01]]

franknoe commented 7 years ago

That's pretty convincing.

So it seems we need a little more connectivity. But, what should we check for?

Am 26/04/17 um 23:41 schrieb fabian-paul:

It can also be shown numerically with our dTRAM implementation that the result is not unique for the irreversible case introduced above. Here are the results of two dTRAM runs with different initial conditions for the optimization. The transition matrices agree with the constrains (Boltzmann factors) but are different in each run:

import numpyas np import pyemma import msmtools np.set_printoptions(precision=1)

prepare sample data for the toy example above

dtrajs= [ np.array([1,0]) ] 10 + [ np.array([0,1]) ] 10 ttrajs= [ np.array([0,0]) ] 10 + [ np.array([1,1]) ] 10 bias_energy= np.array([[0.0,0.0], [-np.log(1.0),-np.log(2.0)]]) dtram= pyemma.thermo.DTRAM(bias_energy,1) dtram.estimate((ttrajs, dtrajs))

T0= dtram.models[0].P T1= dtram.models[1].P C0= dtram.count_matrices[0,:,:] C1= dtram.count_matrices[1,:,:] print 'log-likelihood', (C0[C0>0] np.log(T0[C0>0])).sum()+ (C1[C1>0]np.log(T1[C1>0])).sum(), dtram.log_likelihood()

pi0= msmtools.analysis.statdist(T0) pi1= msmtools.analysis.statdist(T1) g0, g1= pi1/pi0 print 'ratio of gammas', g1/g0,'(should be 2)'

print T0 print T1 |log-likelihood -2.00000016549e-10 -2.00000016549e-10 ratio of gammas 2.0 (should be 2) [[ 3.0e-01 7.0e-01] [ 1.0e+00 1.0e-11]] [[ 1.0e-11 1.0e+00] [ 7.1e-01 2.9e-01]] | dtram= pyemma.thermo.DTRAM(bias_energy,1) pi_guess= np.array([10,1],dtype=np.float64) pi_guess/= pi_guess.sum() dtram.conf_energies= -np.log(pi_guess)# different initial conditions dtram.estimate((ttrajs, dtrajs))

T0= dtram.models[0].P T1= dtram.models[1].P C0= dtram.count_matrices[0,:,:] C1= dtram.count_matrices[1,:,:] print 'log-likelihood', (C0[C0>0] np.log(T0[C0>0])).sum()+ (C1[C1>0]np.log(T1[C1>0])).sum(), dtram.log_likelihood()

pi0= msmtools.analysis.statdist(T0) pi1= msmtools.analysis.statdist(T1) g0, g1= pi1/pi0 print 'ratio of gammas', g1/g0,'(should be 2)'

print T0 print T1 |log-likelihood -2.00000016549e-10 -2.00000016549e-10 ratio of gammas 1.99999999999 (should be 2) [[ 2.7e-01 7.3e-01] [ 1.0e+00 1.0e-11]] [[ 1.0e-11 1.0e+00] [ 6.9e-01 3.1e-01]] |

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1052#issuecomment-297549527, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQlhII55gZn3YfVVXk0h7I7ZK5en5ks5rz7nvgaJpZM4MI_fM.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 7 years ago

This is what I would do: Documentation: https://github.com/markovmodel/thermotools/blob/fix-cset/thermotools/cset.py#L51

implementation 1 (ignoring overlap between thermodynamic states): https://github.com/markovmodel/thermotools/blob/fix-cset/thermotools/cset.py#L269

implementation 2 (also checking for overlap between thermodynamic states): https://github.com/markovmodel/thermotools/blob/fix-cset/thermotools/cset.py#L336

marscher commented 6 years ago

I think this has been fixed. Reopen it, if thats not the case.