markovmodel / PyEMMA

šŸš‚ Python API for Emma's Markov Model Algorithms šŸš‚
http://pyemma.org
GNU Lesser General Public License v3.0
311 stars 119 forks source link

estimation failed due to alpha <= 0 #1383

Closed jiajiexiao closed 5 years ago

jiajiexiao commented 5 years ago

Hi,

Thanks for releasing and maintaining such a great package for markov modeling on MD data. I'm submitting an issue that I experienced when trying to creating hMM. The failure occurred with an error of alpha <= 0 for the situation calling Gibbs sampling for Baysian estimator. Detailed error can be seen below:

msm.bayesian_hidden_markov_model(dtrajs, num_states, 10) 

Sampling HMSMs
0% 0/100 [00:00<?, ?it/s]
/Users/jj/anaconda/lib/python3.6/site-packages/msmtools/analysis/dense/decomposition.py:540: ImaginaryEigenValueWarning: Using eigenvalues with non-zero imaginary part
  warnings.warn('Using eigenvalues with non-zero imaginary part', ImaginaryEigenValueWarning)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-79-e704c0a3e41e> in <module>()
----> 1 msm.bayesian_hidden_markov_model(dtrajs, num_states, 10)

/Users/jj/anaconda/lib/python3.6/site-packages/pyemma/msm/api.py in bayesian_hidden_markov_model(dtrajs, nstates, lag, nsamples, reversible, stationary, connectivity, mincount_connectivity, separate, observe_nonempty, stride, conf, dt_traj, store_hidden, show_progress)
   1316                                   separate=separate, observe_nonempty=observe_nonempty,
   1317                                   dt_traj=dt_traj, conf=conf, store_hidden=store_hidden, show_progress=show_progress)
-> 1318     return bhmsm_estimator.estimate(dtrajs)
   1319 
   1320 def estimate_augmented_markov_model(dtrajs, ftrajs, lag, m, sigmas,

/Users/jj/anaconda/lib/python3.6/site-packages/pyemma/_base/estimator.py in estimate(self, X, **params)
    410 
    411         """
--> 412         # set params
    413         if params:
    414             self.set_params(**params)

/Users/jj/anaconda/lib/python3.6/site-packages/pyemma/msm/estimators/bayesian_hmsm.py in _estimate(self, dtrajs)
    286 
    287         from bhmm import discrete_hmm, bayesian_hmm
--> 288 
    289         if self.init_hmsm is not None:
    290             hmm_mle = self.init_hmsm.hmm

/Users/jj/anaconda/lib/python3.6/site-packages/bhmm/api.py in bayesian_hmm(observations, estimated_hmm, nsample, reversible, stationary, p0_prior, transition_matrix_prior, store_hidden, call_back)
    465     # Sample models.
    466     sampled_hmms = sampler.sample(nsamples=nsample, save_hidden_state_trajectory=store_hidden,
--> 467                                   call_back=call_back)
    468     # return model
    469     from bhmm.hmm.generic_sampled_hmm import SampledHMM

/Users/jj/anaconda/lib/python3.6/site-packages/bhmm/estimators/bayesian_sampling.py in sample(self, nsamples, nburn, nthin, save_hidden_state_trajectory, call_back)
    248         for iteration in range(nsamples):
    249             logger().info("Iteration %8d / %8d" % (iteration, nsamples))
--> 250             # Run a number of Gibbs sampling updates to generate each sample.
    251             for thin in range(nthin):
    252                 self._update()

/Users/jj/anaconda/lib/python3.6/site-packages/bhmm/estimators/bayesian_sampling.py in _update(self)
    268         """
    269         initial_time = time.time()
--> 270 
    271         self._updateHiddenStateTrajectories()
    272         self._updateEmissionProbabilities()

/Users/jj/anaconda/lib/python3.6/site-packages/bhmm/estimators/bayesian_sampling.py in _updateEmissionProbabilities(self)
    331 
    332         """
--> 333         observations_by_state = [self.model.collect_observations_in_state(self.observations, state)
    334                                  for state in range(self.model.nstates)]
    335         self.model.output_model.sample(observations_by_state)

/Users/jj/anaconda/lib/python3.6/site-packages/bhmm/output_models/discrete.py in sample(self, observations_by_state)
    248             count += self.prior[i]
    249             positive = count > 0
--> 250             # if counts at all: can't sample, so leave output probabilities as they are.
    251             self._output_probabilities[i, positive] = dirichlet(count[positive])
    252 

mtrand.pyx in mtrand.RandomState.dirichlet()

ValueError: alpha <= 0

This also occurs for the following example:

its = msm.timescales_hmsm(dtrajs, num_states, lags=lags, errors='bayes', nsamples=250)

10-01-19 01:22:21 pyemma.msm.estimators.bayesian_hmsm.BayesianHMSM[164] WARNING  Ignored error during estimation: alpha <= 0
WARNING:pyemma.msm.estimators.bayesian_hmsm.BayesianHMSM[164]:Ignored error during estimation: alpha <= 0
/Users/jj/anaconda/lib/python3.6/site-packages/msmtools/analysis/dense/decomposition.py:540: ImaginaryEigenValueWarning: Using eigenvalues with non-zero imaginary part
  warnings.warn('Using eigenvalues with non-zero imaginary part', ImaginaryEigenValueWarning)
10-01-19 01:22:21 pyemma.msm.estimators.bayesian_hmsm.BayesianHMSM[165] WARNING  Ignored error during estimation: alpha <= 0

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-76-7df7724ff720> in <module>()
      4 lags = [1,2,3,5,7,10,15,20,50,80,100]
      5 #lags = 500
----> 6 its = msm.timescales_hmsm(dtrajs, num_states, lags=lags, errors='bayes', nsamples=250)

/Users/jj/anaconda/lib/python3.6/site-packages/pyemma/msm/api.py in timescales_hmsm(dtrajs, nstates, lags, nits, reversible, stationary, connectivity, mincount_connectivity, separate, errors, nsamples, stride, n_jobs, show_progress)
    968     itsobj = _ImpliedTimescales(estimator, lags=lags, nits=nits, n_jobs=n_jobs,
    969                                 show_progress=show_progress)
--> 970     itsobj.estimate(dtrajs)
    971     return itsobj
    972 

/Users/jj/anaconda/lib/python3.6/site-packages/pyemma/msm/estimators/implied_timescales.py in estimate(self, X, **params)
    146             estimation. None means the number of timescales will be automatically
    147             determined.
--> 148 
    149         n_jobs: int, optional
    150             how many subprocesses to start to estimate the models for each lag time.

/Users/jj/anaconda/lib/python3.6/site-packages/pyemma/_base/estimator.py in estimate(self, X, **params)
    410 
    411         """
--> 412         # set params
    413         if params:
    414             self.set_params(**params)

/Users/jj/anaconda/lib/python3.6/site-packages/pyemma/msm/estimators/implied_timescales.py in _estimate(self, dtrajs)
    202         # run estimation on all lag times
    203         pg = ProgressReporter()
--> 204         with pg.context():
    205             models, estimators = estimate_param_scan(self.estimator, dtrajs, param_sets, failfast=False,
    206                                                      return_estimators=True, n_jobs=self.n_jobs,

/Users/jj/anaconda/lib/python3.6/site-packages/pyemma/msm/estimators/implied_timescales.py in _postprocess_results(self, models)
    219 
    220         if len(good) != len(models):
--> 221             def _format_failed_models():
    222                 errors = []
    223                 for b in bad:

RuntimeError: Estimation has failed at ALL lagtimes. Details:
['Error at lag time 1: alpha <= 0',
 'Error at lag time 2: alpha <= 0',
 'Error at lag time 3: alpha <= 0',
 'Error at lag time 5: alpha <= 0',
 'Error at lag time 7: alpha <= 0',
 'Error at lag time 10: alpha <= 0',
 'Error at lag time 15: alpha <= 0',
 'Error at lag time 20: alpha <= 0',
 'Error at lag time 50: alpha <= 0',
 'Error at lag time 80: alpha <= 0',
 'Error at lag time 100: alpha <= 0']

Any clue what's wrong? How could the alpha value to be <= 0 and what does it mean? Many thanks!

jiajiexiao commented 5 years ago

conda.txt Conda env is attached.

marscher commented 5 years ago

hi @xiaoj12,

thanks for appreciating our software and this complete bug report. The message is caused by non-positive elements in one of the rows of the count matrix. However the posted stack trace is contradictory, because in the last file (discrete.py of bhmm), we are explicitly checking for positivity and omit non-positive elements in the dirichlet sampling step. The stack says it has executed the comment line, which can never be true. So I suspect that your conda environment is somehow broken (e.g. the used bhmm version is actually 0.6.2, while the extracted source code for the stack comes from 0.6.3). I have absolutely no idea, why this is happening, but the observation leads to no other clue. In addition you have two versions of numpy in the same environment, one from conda (outdated) and a recent one installed with pip). I strongly dis encourage the usage of mixed environments (sometimes it is inevitable, but this does not seem to be the case here). I bet, that the same estimation works perfectly in a fresh environment. Would please try it out?

marscher commented 5 years ago

for your convenience

conda create -n new pyemma
conda activate new
python your_estimation_script.py
jiajiexiao commented 5 years ago

@marscher Thank you so much for your quick reply. The issue was gone after I set up a new environment for pyemma. Greatly appreciated!

marscher commented 5 years ago

You're most welcome. Issues of this kind are very hard to fix on your own. It should be noted, that the problem likely arose from mixing pip with conda packages and having two pip versions (9.0 and a recent one) installed.