Open s-gordon opened 10 years ago
So I'm not sure which part of the calculation is crashing, but this does happen sometimes.
I think the easiest workaround for now is to use fewer lagtimes or fewer states. I think this bug tends to happen more at longer lagtimes or with more states, but I'm not 100% sure.
Hmmm. I've just tried halving the number of states from ~1700 to 800, and this is what I'm getting now:
/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py:486: SparseEfficiencyWarning: changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
/usr/local/lib/python2.7/dist-packages/msmbuilder-2.7.dev-py2.7-linux-x86_64.egg/msmbuilder/MSMLib.py:592: RuntimeWarning: invalid value encountered in double_scalars
logger.info("Selected component %d with population %f", ComponentInd, ComponentPops[ComponentInd] / ComponentPops.sum())
10:17:31 - Selected component 0 with population nan
10:17:31 - Calculating implied timescales at lagtime 15
Traceback (most recent call last):
File "/usr/local/bin/CalculateImpliedTimescales.py", line 5, in <module>
pkg_resources.run_script('msmbuilder==2.7.dev', 'CalculateImpliedTimescales.py')
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 499, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1235, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/msmbuilder-2.7.dev-py2.7-linux-x86_64.egg/EGG-INFO/scripts/CalculateImpliedTimescales.py", line 82, in <module>
(not args.notrim), args.symmetrize, args.procs)
File "/usr/local/lib/python2.7/dist-packages/msmbuilder-2.7.dev-py2.7-linux-x86_64.egg/EGG-INFO/scripts/CalculateImpliedTimescales.py", line 64, in run
trimming=trimming, symmetrize=symmetrize, n_procs=nProc)
File "/usr/local/lib/python2.7/dist-packages/msmbuilder-2.7.dev-py2.7-linux-x86_64.egg/msmbuilder/msm_analysis.py", line 185, in get_implied_timescales
lags = result.get(999999)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
IndexError: invalid index
Line 5 worries me the most. Further decreasing the number of states does not seem to solve the problem, although I haven't seen any crashes yet.
Nevermind; I think I've found a solution. The script requires that only lag times which are multiples of the stride (50 in my case) be sampled. All other lag times either result in the script crashing for me or returning an "IndexError: invalid index" error.
While I've found a solution, I'm not sure that I understand the philosophy behind it...
OK, I think I know what's going on. It's actually not possible to extend a hierarchical clustering to lagtimes that are more frequent than the one used during clustering. This is because there is no concept of "generator" or "cluster center" in hierarchical clustering.
For k-centers, k-medoids, and hybrid, there IS the concept of a generator, which allows you to transfer (or apply) your clustering to new data.
Regardless, you should get a more descriptive error here, which we will fix.
Hmmmm. Some food for thought.
Thanks for the help.
Following on from the issue I raised earlier (#295), I'm having troubles with the CalculateImpliedTimescales.py script when working with assignments generated by AssignHierarchical.py. This does not seem to occur when using assignments generated using rmsd hybrid clustering.
The following is the typical output I'm getting when executing the script:
It simply cuts out after the last line without writing out ImpliedTimescales.dat.