ConvertDataToHDF.py returns error when reading dcd file

msmbuilder / msmbuilder-legacy

Legacy release of MSMBuilder

http://msmbuilder.org

GNU General Public License v2.0

25 stars 28 forks source link

ConvertDataToHDF.py returns error when reading dcd file #183

Closed nanjie closed 11 years ago

nanjie commented 11 years ago

Hi, the following error messages are encountered after I issued "ConvertDataToHDF.py -s native.pdb -i DCD -S file_dcd" to convert dcd files in the tutorial directory to HDF files:

{'input_dir': 'DCD', 'min_length': 0, 'pdb': 'native.pdb', 'project': 'ProjectInfo.yaml', 'quiet': False, 'rmsd_cutoff': -1, 'source': 'file_dcd', 'stride': 1} 17:06:00 - WARNING: Sorting trajectory files by numerical values in their names. 17:06:00 - Ensure that numbering is as intended. 17:06:00 - Found 100 traj dirs ['DCD/RUN00/frame0.dcd'] dcdplugin) detected standard 32-bit DCD file of native endianness dcdplugin) CHARMM format DCD file (also NAMD 2.1 and later) Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/ConvertDataToHDF.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'ConvertDataToHDF.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/ConvertDataToHDF.py", line 136, in args.min_length, args.stride, rmsd_cutoff) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/ConvertDataToHDF.py", line 70, in run pb.get_project().save(projectfn) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/project/builder.py", line 128, in get_project self.convert() File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/project/builder.py", line 159, in convert traj = self._load_traj(file_list) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/project/builder.py", line 223, in _load_traj traj = Trajectory.load_from_dcd(file_list, PDBFilename=self.conf_filename) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py", line 420, in load_from_dcd for c in dcd.DCDReader(FilenameList): File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/dcd.py", line 200, in next coords = np.asfarray(np.array(xyzvec).reshape(self.natoms.value, 3)) ValueError: total size of new array must be unchanged

kyleabeauchamp commented 11 years ago

Are you sure your PDB and DCD files have the same number of atoms? Perhaps you should strip the solvent out of both DCD and PDB before you try to convert?

nanjie commented 11 years ago

Oh, I was just following the Tutorial ala-dipeptide, and ran into problem with the DCD files that comes with Msmbuilder installation. I believe the native.pdb corresponds to the dcd files in DCd folder.

On Wed, Apr 24, 2013 at 6:53 PM, kyleabeauchamp notifications@github.comwrote:

Are you sure your PDB and DCD files have the same number of atoms? Perhaps you should strip the solvent out of both DCD and PDB before you try to convert?

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-16977555 .

kyleabeauchamp commented 11 years ago

This looks like a bug in our trajectory reader. Robert: I also tried this:

from msmbuilder import Trajectory
r = Trajectory.load_from_dcd("./DCD/RUN00/frame0.dcd", "./native.pdb")

I get the same error.

kyleabeauchamp commented 11 years ago

I've also seen similar issues, but I always assumed that I had "bad" DCD files on my system.

rmcgibbo commented 11 years ago

We should accelerate the timetable for moving the mdtraj code into msmbuilder.

kyleabeauchamp commented 11 years ago

IMHO, that's longer term and we should try to patch what we have now.

kyleabeauchamp commented 11 years ago

I think mdtraj usage requires another Major release (3.0?), so we still need to patch the current branch.

rmcgibbo commented 11 years ago

Fair enough. You're probably right. I just hate ctypes code.

-Robert Sent from my iPhone.

On Wed, Apr 24, 2013 at 5:52 PM, kyleabeauchamp notifications@github.com wrote:

I think mdtraj usage requires another Major release (3.0?), so we still need to patch the current branch.

Reply to this email directly or view it on GitHub: https://github.com/SimTk/msmbuilder/issues/183#issuecomment-16981889

nanjie commented 11 years ago

Thank you guys. Another question: I got a long 1,000,000 frame trajectory (208 microsec). Does MSMbuilder works more efficiently with single long trajectory or I shall break my trajectory into several shorter ones? I plan to build a 25,000-state MSM first, and then PCCA to a 1000-macrostate MSM.

Nanjie

On Wed, Apr 24, 2013 at 8:53 PM, Robert McGibbon notifications@github.comwrote:

Fair enough. You're probably right. I just hate ctypes code.

-Robert Sent from my iPhone.

On Wed, Apr 24, 2013 at 5:52 PM, kyleabeauchamp notifications@github.com

wrote:

I think mdtraj usage requires another Major release (3.0?), so we still

need to patch the current branch.

Reply to this email directly or view it on GitHub: https://github.com/SimTk/msmbuilder/issues/183#issuecomment-16981889

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-16981915 .

kyleabeauchamp commented 11 years ago

I think a single trajectory should be fine. Just make sure your machine has enough memory to hold the entire trajectory in memory.

rmcgibbo commented 11 years ago

The only reason for breaking up into shorted trajectories would be if you're encountering memory issues. But the MSMBuilder2.6 release should be significantly more memory efficient during some key stages than previous releases.

-Robert

On Apr 24, 2013, at 5:58 PM, nanjie wrote:

Thank you guys. Another question: I got a long 1,000,000 frame trajectory (208 microsec). Does MSMbuilder works more efficiently with single long trajectory or I shall break my trajectory into several shorter ones? I plan to build a 25,000-state MSM first, and then PCCA to a 1000-macrostate MSM.

Nanjie

On Wed, Apr 24, 2013 at 8:53 PM, Robert McGibbon notifications@github.comwrote:

Fair enough. You're probably right. I just hate ctypes code.

-Robert Sent from my iPhone.

On Wed, Apr 24, 2013 at 5:52 PM, kyleabeauchamp notifications@github.com

wrote:

I think mdtraj usage requires another Major release (3.0?), so we still

need to patch the current branch.

Reply to this email directly or view it on GitHub: https://github.com/SimTk/msmbuilder/issues/183#issuecomment-16981889

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-16981915 .

— Reply to this email directly or view it on GitHub.

kyleabeauchamp commented 11 years ago

Nanjie: your trajectory is currently in DCD format, correct? If this bug prevents you from reading your trajectory, I have a couple of possible workarounds.

Use VMD to convert your trajectory to TRR, then use gromacs (trjconv -f traj.trr -s native.pdb -o traj.xtc) to convert your trajectory to XTC, then use that XTC in "ConvertDataToHDF"
Use MDTraj (https://github.com/rmcgibbo/mdtraj) to convert directly from DCD to XTC. Then use "ConvertDataToHDF" to convert the trajectory.

I'll also let you know when I figure out the current bug.

kyleabeauchamp commented 11 years ago

Sorry for the issues. Our DCD support was contributed a couple of years ago, but we haven't watched it as closely as our Gromacs XTC support. In the future, we're planning to have a really powerful trajectory reader (MDTraj) that supports everything.

rmcgibbo commented 11 years ago

Once you've installed mdtraj, the "script" to convert between dcd and xtc is basically

#!/usr/bin/env python
from mdtraj import trajectory
t = trajectory.load('mytrajectory.dcd', top='native.pdb')
t.save('mytrajectory.xtc')

nanjie commented 11 years ago

Thank you so much, I really appreciate. i will use your workaround to build the MSM. I worked with modeling software, I know it's hard making it bug-free.

Nanjie

On Wed, Apr 24, 2013 at 9:06 PM, kyleabeauchamp notifications@github.comwrote:

Sorry for the issues. Our DCD support was contributed a couple of years ago, but we haven't watched it as closely as our Gromacs XTC support. In the future, we're planning to have a really powerful trajectory reader (MDTraj) that supports everything.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-16982316 .

nanjie commented 11 years ago

Thank you Robert, i will definitely try it tomorrow. Best, Nanjie

On Wed, Apr 24, 2013 at 9:09 PM, Robert McGibbon notifications@github.comwrote:

Once you've installed mdtraj, the "script" to convert between dcd and xtc is basically

!/usr/bin/env python

from mdtraj import trajectory t = trajectory.load('mytrajectory.dcd', top='native.pdb') t.save('mytrajectory.xtc')

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-16982396 .

kyleabeauchamp commented 11 years ago

I have one more comment on your proposed workflow. You mentioned doing a 1000 macrostate MSM using PCCA. I'm not sure that PCCA based methods give good lumping with that many macrostates. I think PCCA type methods are best suited for systems with a strong separation of timescales--e.g. a handful of implied timescales are separated by a large timescale gap. This is based on my intuition and experience.

Either way, you will probably have to play with a few numbers of microstates and macrostates. The implied timescales will help guide you there.

nanjie commented 11 years ago

Thank you Kyle. You could be right. The main reason I want to have a less coarse-grained MSM is that with few macrostates, most of the folding events occur in just one jump, making it harder to see the detailed mechanism, such as which helix/turn forms first.

I think what you said makes sense. I will experiment with micro- and macro- MSMs, by comparing their implied timescales.

Best,

Nanjie

On Thu, Apr 25, 2013 at 12:43 AM, kyleabeauchamp notifications@github.comwrote:

I have one more comment on your proposed workflow. You mentioned doing a 1000 macrostate MSM using PCCA. I'm not sure that PCCA based methods give good lumping with that many macrostates. I think PCCA type methods are best suited for systems with a strong separation of timescales--e.g. a handful of implied timescales are separated by a large timescale gap. This is based on my intuition and experience.

Either way, you will probably have to play with a few numbers of microstates and macrostates. The implied timescales will help guide you there.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-16988471 .

nanjie commented 11 years ago

Hi, Kyle,

Is the atom count in AtomIndices.dat zero based ? When I include all the PDB atoms in it (20 C-alpha), Cluster.py will fail with an error saying Indices out of bound, see below. So I reduced the atom counts in AtomIndices by 1, then Cluster.py runs OK. Do you know why? Sorry for keep bothering you! This not a big problem, I can do without the C-terminal residue in rmsd clustering. But just curious to know why it behaves so.

Nanjie

{'alg': 'hybrid', 'hybrid_distance_cutoff': 0.24, 'hybrid_global_iters': 0, 'hybrid_ignore_max_objective': False, 'hybrid_local_num_iters': 50, 'hybrid_num_clusters': None, 'hybrid_too_close_cutoff': 0.0001, 'metric': 'rmsd', 'output_dir': 'Data/', 'project': 'ProjectInfo.yaml', 'quiet': False, 'rmsd_atom_indices': 'AtomIndices.dat', 'stride': 1} 13:02:14 - RMSD metric - loading only the atom indices required Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/Cluster.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'Cluster.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/Cluster.py", line 228, in main(args, metric) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/Cluster.py", line 203, in main trajs = load_trajectories(args.project, args.stride, atom_indices) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/Cluster.py", line 121, in load_trajectories traj = project.load_traj(i, stride=stride, atom_indices=atom_indices) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/project/project.py", line 340, in load_traj AtomIndices=atom_indices) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py", line 722, in load_trajectory_file return Trajectory.load_from_lhdf(Filename, JustInspect=JustInspect, Stride=Stride, AtomIndices=AtomIndices) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py", line 627, in load_from_lhdf A = cls.load_from_hdf(TrajFilename, Stride=Stride, AtomIndices=AtomIndices) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py", line 596, in load_from_hdf chunk_list = list(cls.enum_chunks_from_hdf(TrajFilename, Stride=Stride, AtomIndices=AtomIndices, ChunkSize=ChunkSize)) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py", line 537, in enum_chunks_from_hdf A['XYZList'] = np.array(F.root.XYZList[r0: r1: Stride, AtomIndices]) File "/home/ndeng/anaconda/lib/python2.7/site-packages/tables/array.py", line 626, in getitem selection, reorder, shape = self._fancySelection(key) File "/home/ndeng/anaconda/lib/python2.7/site-packages/tables/array.py", line 572, in _fancySelection validate_number(nexp[select_idx], length) File "/home/ndeng/anaconda/lib/python2.7/site-packages/tables/array.py", line 439, in validate_number raise IndexError("Index out of bounds: %d" % num) IndexError: Index out of bounds: 20 Closing remaining open files: /net/briareus/u2/ndeng/projects/desres/c-alpha/2JOF-0-c-alpha/msm/Trajectories/trj0.lh5... done

On Thu, Apr 25, 2013 at 11:04 AM, Nanjie Deng nanjie.deng@gmail.com wrote:

Thank you Kyle. You could be right. The main reason I want to have a less coarse-grained MSM is that with few macrostates, most of the folding events occur in just one jump, making it harder to see the detailed mechanism, such as which helix/turn forms first.

I think what you said makes sense. I will experiment with micro- and macro- MSMs, by comparing their implied timescales.

Best,

Nanjie

On Thu, Apr 25, 2013 at 12:43 AM, kyleabeauchamp <notifications@github.com

wrote:

I have one more comment on your proposed workflow. You mentioned doing a 1000 macrostate MSM using PCCA. I'm not sure that PCCA based methods give good lumping with that many macrostates. I think PCCA type methods are best suited for systems with a strong separation of timescales--e.g. a handful of implied timescales are separated by a large timescale gap. This is based on my intuition and experience.

Either way, you will probably have to play with a few numbers of microstates and macrostates. The implied timescales will help guide you there.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-16988471 .

kyleabeauchamp commented 11 years ago

Yes, the atom count is zero based. Based on your comment, I added a note about this to our tutorial. The updated tutorial won't be available until our next release, though.

kyleabeauchamp commented 11 years ago

Yes, the AtomIndices.dat should be zero based.

kyleabeauchamp commented 11 years ago

I added a note about the zero based indexing to our tutorial, so hopefully that will make things clearer for people. This won't be until our next release, though...

On 04/25/2013 10:11 AM, nanjie wrote:

Hi, Kyle,

Is the atom count in AtomIndices.dat zero based ? When I include all the PDB atoms in it (20 C-alpha), Cluster.py will fail with an error saying Indices out of bound, see below. So I reduced the atom counts in AtomIndices by 1, then Cluster.py runs OK. Do you know why? Sorry for keep bothering you! This not a big problem, I can do without the C-terminal residue in rmsd clustering. But just curious to know why it behaves so.

Nanjie

{'alg': 'hybrid', 'hybrid_distance_cutoff': 0.24, 'hybrid_global_iters': 0, 'hybrid_ignore_max_objective': False, 'hybrid_local_num_iters': 50, 'hybrid_num_clusters': None, 'hybrid_too_close_cutoff': 0.0001, 'metric': 'rmsd', 'output_dir': 'Data/', 'project': 'ProjectInfo.yaml', 'quiet': False, 'rmsd_atom_indices': 'AtomIndices.dat', 'stride': 1} 13:02:14 - RMSD metric - loading only the atom indices required Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/Cluster.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'Cluster.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/Cluster.py",

line 228, in main(args, metric) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/Cluster.py",

line 203, in main trajs = load_trajectories(args.project, args.stride, atom_indices) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/Cluster.py",

line 121, in load_trajectories traj = project.load_traj(i, stride=stride, atom_indices=atom_indices) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/project/project.py",

line 340, in load_traj AtomIndices=atom_indices) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py",

line 722, in load_trajectory_file return Trajectory.load_from_lhdf(Filename, JustInspect=JustInspect, Stride=Stride, AtomIndices=AtomIndices) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py",

line 627, in load_from_lhdf A = cls.load_from_hdf(TrajFilename, Stride=Stride, AtomIndices=AtomIndices) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py",

line 596, in load_from_hdf chunk_list = list(cls.enum_chunks_from_hdf(TrajFilename, Stride=Stride, AtomIndices=AtomIndices, ChunkSize=ChunkSize)) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py",

line 537, in enum_chunks_from_hdf A['XYZList'] = np.array(F.root.XYZList[r0: r1: Stride, AtomIndices]) File "/home/ndeng/anaconda/lib/python2.7/site-packages/tables/array.py", line 626, in getitem selection, reorder, shape = self._fancySelection(key) File "/home/ndeng/anaconda/lib/python2.7/site-packages/tables/array.py", line 572, in _fancySelection validate_number(nexp[select_idx], length) File "/home/ndeng/anaconda/lib/python2.7/site-packages/tables/array.py", line 439, in validate_number raise IndexError("Index out of bounds: %d" % num) IndexError: Index out of bounds: 20 Closing remaining open files: /net/briareus/u2/ndeng/projects/desres/c-alpha/2JOF-0-c-alpha/msm/Trajectories/trj0.lh5...

done

On Thu, Apr 25, 2013 at 11:04 AM, Nanjie Deng nanjie.deng@gmail.com wrote:

Thank you Kyle. You could be right. The main reason I want to have a less coarse-grained MSM is that with few macrostates, most of the folding events occur in just one jump, making it harder to see the detailed mechanism, such as which helix/turn forms first.

I think what you said makes sense. I will experiment with micro- and macro- MSMs, by comparing their implied timescales.

Best,

Nanjie

On Thu, Apr 25, 2013 at 12:43 AM, kyleabeauchamp <notifications@github.com

wrote:

I have one more comment on your proposed workflow. You mentioned doing a 1000 macrostate MSM using PCCA. I'm not sure that PCCA based methods give good lumping with that many macrostates. I think PCCA type methods are best suited for systems with a strong separation of timescales--e.g. a handful of implied timescales are separated by a large timescale gap. This is based on my intuition and experience.

Either way, you will probably have to play with a few numbers of microstates and macrostates. The implied timescales will help guide you there.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-16988471 .

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17022887.

nanjie commented 11 years ago

Great, thanks.

On Thu, Apr 25, 2013 at 1:51 PM, kyleabeauchamp notifications@github.comwrote:

Yes, the atom count is zero based. Based on your comment, I added a note about this to our tutorial. The updated tutorial won't be available until our next release, though.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17028463 .

nanjie commented 11 years ago

Hi,

The clustering using Cluster.py seems to be running into infinite loop, it keeps printing message lines like below for more than 2 hours.

This happens after I tpyed " Cluster.py rmsd hybrid -d 0.24 -l 50".

Nanjie

15:21:55 - Sweep 8, swapping medoid 3084 (conf 92395) for conf 92394... 15:21:55 - Reject. New f = 0.137366, Old f = 0.137366 15:21:55 - Sweep 8, swapping medoid 3087 (conf 71190) for conf 71145... 15:21:55 - Reject. New f = 0.137367, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3090 (conf 15152) for conf 76291... 15:21:56 - Reject. New f = 0.137366, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3092 (conf 40157) for conf 40159... 15:21:56 - Reject. New f = 0.137367, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3093 (conf 16891) for conf 16892... 15:21:56 - Reject. New f = 0.137367, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3095 (conf 73794) for conf 73793... 15:21:56 - Reject. New f = 0.137366, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3098 (conf 98528) for conf 98526... 15:21:56 - Reject. New f = 0.137366, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3100 (conf 30312) for conf 30313... 15:21:56 - Reject. New f = 0.137366, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3101 (conf 71865) for conf 71866... 15:21:56 - Reject. New f = 0.137366, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3104 (conf 87206) for conf 61561... 15:21:56 - Reject. New f = 0.137366, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3107 (conf 71052) for conf 71054... 15:21:56 - Accept. New f = 0.137366, Old f = 0.137366 15:21:56 - Sweep 8, swapping medoid 3108 (conf 49608) for conf 83394... 15:21:56 - Reject. New f = 0.137366, Old f = 0.137366

On Thu, Apr 25, 2013 at 1:58 PM, kyleabeauchamp notifications@github.comwrote:

I added a note about the zero based indexing to our tutorial, so hopefully that will make things clearer for people. This won't be until our next release, though...

On 04/25/2013 10:11 AM, nanjie wrote:

Hi, Kyle,

Is the atom count in AtomIndices.dat zero based ? When I include all the PDB atoms in it (20 C-alpha), Cluster.py will fail with an error saying Indices out of bound, see below. So I reduced the atom counts in AtomIndices by 1, then Cluster.py runs OK. Do you know why? Sorry for keep bothering you! This not a big problem, I can do without the C-terminal residue in rmsd clustering. But just curious to know why it behaves so.

Nanjie

{'alg': 'hybrid', 'hybrid_distance_cutoff': 0.24, 'hybrid_global_iters': 0, 'hybrid_ignore_max_objective': False, 'hybrid_local_num_iters': 50, 'hybrid_num_clusters': None, 'hybrid_too_close_cutoff': 0.0001, 'metric': 'rmsd', 'output_dir': 'Data/', 'project': 'ProjectInfo.yaml', 'quiet': False, 'rmsd_atom_indices': 'AtomIndices.dat', 'stride': 1} 13:02:14 - RMSD metric - loading only the atom indices required Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/Cluster.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'Cluster.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/Cluster.py",

line 228, in main(args, metric) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/Cluster.py",

line 203, in main trajs = load_trajectories(args.project, args.stride, atom_indices) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/Cluster.py",

line 121, in load_trajectories traj = project.load_traj(i, stride=stride, atom_indices=atom_indices) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/project/project.py",

line 340, in load_traj AtomIndices=atom_indices) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py",

line 722, in load_trajectory_file return Trajectory.load_from_lhdf(Filename, JustInspect=JustInspect, Stride=Stride, AtomIndices=AtomIndices) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py",

line 627, in load_from_lhdf A = cls.load_from_hdf(TrajFilename, Stride=Stride, AtomIndices=AtomIndices) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py",

line 596, in load_from_hdf chunk_list = list(cls.enum_chunks_from_hdf(TrajFilename, Stride=Stride, AtomIndices=AtomIndices, ChunkSize=ChunkSize)) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py",

line 537, in enum_chunks_from_hdf A['XYZList'] = np.array(F.root.XYZList[r0: r1: Stride, AtomIndices]) File "/home/ndeng/anaconda/lib/python2.7/site-packages/tables/array.py", line 626, in getitem selection, reorder, shape = self._fancySelection(key) File "/home/ndeng/anaconda/lib/python2.7/site-packages/tables/array.py", line 572, in _fancySelection validate_number(nexp[select_idx], length) File "/home/ndeng/anaconda/lib/python2.7/site-packages/tables/array.py", line 439, in validate_number raise IndexError("Index out of bounds: %d" % num) IndexError: Index out of bounds: 20 Closing remaining open files:

/net/briareus/u2/ndeng/projects/desres/c-alpha/2JOF-0-c-alpha/msm/Trajectories/trj0.lh5...

done

On Thu, Apr 25, 2013 at 11:04 AM, Nanjie Deng nanjie.deng@gmail.com wrote:

Thank you Kyle. You could be right. The main reason I want to have a less coarse-grained MSM is that with few macrostates, most of the folding events occur in just one jump, making it harder to see the detailed mechanism, such as which helix/turn forms first.

I think what you said makes sense. I will experiment with micro- and macro- MSMs, by comparing their implied timescales.

Best,

Nanjie

On Thu, Apr 25, 2013 at 12:43 AM, kyleabeauchamp <notifications@github.com

wrote:

I have one more comment on your proposed workflow. You mentioned doing a 1000 macrostate MSM using PCCA. I'm not sure that PCCA based methods give good lumping with that many macrostates. I think PCCA type methods are best suited for systems with a strong separation of timescales--e.g. a handful of implied timescales are separated by a large timescale gap. This is based on my intuition and experience.

Either way, you will probably have to play with a few numbers of microstates and macrostates. The implied timescales will help guide you there.

— Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-16988471>

.

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17022887.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17029273 .

rmcgibbo commented 11 years ago

That's not an infinite loop. You asked for 50 rounds of swaps, it's on round 8.

nanjie commented 11 years ago

Sorry, my misunderstanding. Yes, it indeed finishes fine.

On Thu, Apr 25, 2013 at 4:22 PM, Robert McGibbon notifications@github.comwrote:

That's not an infinite loop. You asked for 50 rounds of swaps, it's on round 8.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17037929 .

nanjie commented 11 years ago

Hi, CalculateImpliedTimescales with MLE produces messages like below, which I wonder if it signal any error? But then the plot of implied timescales looks normal, showing expected separation of time scales and slowest time does corresponds to MD folding time.

Nanjie

{'assignments': 'Data/Assignments.h5', 'eigvals': 10, 'interval': 5, 'lagtime': '1,100', 'output': 'Data/ImpliedTimescales.no_trim.dat', 'procs': 1, 'quiet': False, 'symmetrize': 'MLE', 'trim': True} 18:12:54 - Getting 10 eigenvalues (timescales) for each lagtime... 18:12:54 - Building MSMs at the following lag times: [1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 66, 71, 76, 81, 86, 91, 96] 18:12:55 - Calculating implied timescales at lagtime 1 18:12:57 - Selected component 0 with population 1.000000 18:13:00 - BFGS likelihood maximization terminated after 197 function calls. Initial and final log likelihoods: -181374.305138, -181374.302881. 18:13:01 - Calculating implied timescales at lagtime 6 18:13:03 - Selected component 0 with population 1.000000 18:13:07 - BFGS likelihood maximization terminated after 218 function calls. Initial and final log likelihoods: -226114.556110, -226114.550265. 18:13:07 - Abnormal termination of BFGS likelihood maximization. Error code 2

On Thu, Apr 25, 2013 at 5:33 PM, Nanjie Deng nanjie.deng@gmail.com wrote:

Sorry, my misunderstanding. Yes, it indeed finishes fine.

On Thu, Apr 25, 2013 at 4:22 PM, Robert McGibbon <notifications@github.com

wrote:

That's not an infinite loop. You asked for 50 rounds of swaps, it's on round 8.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17037929 .

kyleabeauchamp commented 11 years ago

It should be fine.

nanjie commented 11 years ago

Hi, Kyle,

What is the criteria for choosing number of k-medoids iterations? From your JCTC paper, I guess it may come from f_max and f_med, which are not printed out during clustering.

Nanjie

nanjie commented 11 years ago

You mean the "Abnormal termination of BFGS" is OK? How to choose a suitable value for k-medoids iteration for Cluster.py? Thanks,

Nanjie

On Thu, Apr 25, 2013 at 6:39 PM, kyleabeauchamp notifications@github.comwrote:

It should be fine.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17045119 .

kyleabeauchamp commented 11 years ago

It's going to be very system specific. I would start with just 10 local iterations and 0 global and see if that looks good. If you are unhappy with the model, you might try to increase the number of iterations.

These parameters are probably not that important, so I'd start by choosing something that's fast. You can try other things later if desired.

kyleabeauchamp commented 11 years ago

The BFGS often seems to have termination issues without really changing the results.

nanjie commented 11 years ago

Thanks. Yes, the slowest timescale has the right order of magnitude.

Nanjie

On Thu, Apr 25, 2013 at 6:49 PM, kyleabeauchamp notifications@github.comwrote:

It's going to be very system specific. I would start with just 10 local iterations and 0 global and see if that looks good. If you are unhappy with the model, you might try to increase the number of iterations.

These parameters are probably not that important, so I'd start by choosing something that's fast. You can try other things later if desired.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17045572 .

nanjie commented 11 years ago

I see. thanks.

On Thu, Apr 25, 2013 at 6:50 PM, kyleabeauchamp notifications@github.comwrote:

The BFGS often seems to have termination issues without really changing the results.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17045608 .

nanjie commented 11 years ago

Hi,

Looking at tProb.mtx, I found that some of the diagonal element Tii are zero, i.e., not shown up in the file. These are typically those weakly populated states, with <= 10^-4 population. The fact that the Tii for these nodes are zero indicates insufficient sampling for these microstates. In general, diagonal elements in T should approach their equilibrium population at long lagtimes.

My question is, why those zero Tii's apparently did not affect the calculation of implied time scales? I mean, the time scales look fine, by comparing with direct folding time from my long MD. I thought that, in computing the time scales, the program needs to solve eigenvalues, and why the matrix diagonalization is not affected by the presence of zero Tiis?

Nanjie

On Thu, Apr 25, 2013 at 6:50 PM, kyleabeauchamp notifications@github.comwrote:

The BFGS often seems to have termination issues without really changing the results.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17045608 .

kyleabeauchamp commented 11 years ago

So the matrix diagonalization should work even if you have zero elements on the diagonal.

At long timescales, all transition matrix elements should approach the equlibrium values--not just the diagonal elements. Thus, accurate statistical estimation becomes more and more challenging as you increase the timescale. The fact that we have any zero values at long timescale is an issue.

This is actually one of the key underlying problems in MSM construction. To achieve accurate kinetics requires long lagtimes and many microstates. However, long lagtimes and many microstates leads to large statistical uncertainties--and null values in Tij.

At this point, we don't really have a fully automated way to help you balance this trade-off.

nanjie commented 11 years ago

Hi Kyle, Thanks for the insights. It's a relief to know that zero diagonal elements (some) do not prevent matrix diagonalization. Yes, all Tij approach equilibrium populations at sufficiently long lag times. If my trajectory were long enough, then I can afford to build a very fine-grained model to have Markov behavior even at short lagtime, which can be propagated to get long time dynamics. But I suspect ~ 100 microsec is hardly long enough statistics for a say 50,000 MSM. Nanjie

On Fri, Apr 26, 2013 at 3:31 PM, kyleabeauchamp notifications@github.comwrote:

So the matrix diagonalization should work even if you have zero elements on the diagonal.

At long timescales, all transition matrix elements should approach the equlibrium values--not just the diagonal elements. Thus, accurate statistical estimation becomes more and more challenging as you increase the timescale. The fact that we have any zero values at long timescale is an issue.

This is actually one of the key underlying problems in MSM construction. To achieve accurate kinetics requires long lagtimes and many microstates. However, long lagtimes and many microstates leads to large statistical uncertainties--and null values in Tij.

At this point, we don't really have a fully automated way to help you balance this trade-off.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17095562 .

kyleabeauchamp commented 11 years ago

I'm closing this issue.

Nanjie: we fixed this bug in our git repository. It's not yet released as a zip file. We'll release a bugfix release soon. If you need it sooner, let us know and we can hurry up.

nanjie commented 11 years ago

Thanks a lot. Actually the work around (trr then xtc) works well, so I am not held up by this bug. But it's good to have a fix. Best, Nanjie

On Mon, Apr 29, 2013 at 11:04 PM, kyleabeauchamp notifications@github.comwrote:

I'm closing this issue.

Nanjie: we fixed this bug in our git repository. It's not yet released as a zip file. We'll release a bugfix release soon. If you need it sooner, let us know and we can hurry up.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17207297 .

nanjie commented 11 years ago

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py", line 115, in do_minimization=args.do_minimization) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py", line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 223, in init self.lump(do_minimization=do_minimization) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 249, in lump A = self.optimize_A(A) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File "/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, *opts) File "/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 424, in _minimize_neldermead fsim[0] = func(x0) File "/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 268, in function_wrapper return function(x, args) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

kyleabeauchamp commented 11 years ago

I would definitely pick fewer macrostates. I just don't think PCCA and PCCA+ were designed to anything but extract a handful of slow states.

I'm not sure why you should get an error, though. Try a smaller model and see what happens.

On 04/30/2013 10:02 AM, nanjie wrote:

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py", line 115, in do_minimization=args.do_minimization) File "/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py", line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 223, in init self.lump(do_minimization=do_minimization) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 249, in lump A = self.optimize_A(A) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File "/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, *opts) File "/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 424, in _minimize_neldermead fsim[0] = func(x0) File "/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 268, in function_wrapper return function(x, args) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File "/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240030.

nanjie commented 11 years ago

Kyle, when I use much smaller macrostates (e.g. 20), using PCCA the job can finish without printing error. But use of PCCA+ still running and printing out following warning: {'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 20, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_20.PCCA+/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 16:06:38 - Creating directory Macro_20.PCCA+ 16:06:38 - Running PCCA+... 16:06:50 - Minimizing PCCA+ objective function. 16:06:50 - Initial value of objective function: f = -13.212701 Warning: Maximum number of iterations exceeded.

The PCCA+ job is still running after some time now and not returning, does this indicates an error?

Nanjie

On Tue, Apr 30, 2013 at 1:08 PM, kyleabeauchamp notifications@github.comwrote:

I would definitely pick fewer macrostates. I just don't think PCCA and PCCA+ were designed to anything but extract a handful of slow states.

I'm not sure why you should get an error, though. Try a smaller model and see what happens.

On 04/30/2013 10:02 AM, nanjie wrote:

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py", line 115, in do_minimization=args.do_minimization) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py", line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 223, in init self.lump(do_minimization=do_minimization) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 249, in lump A = self.optimize_A(A) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, **opts) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 424, in _minimize_neldermead fsim[0] = func(x0) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 268, in function_wrapper return function(x, *args) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py", line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240030.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240376 .

kyleabeauchamp commented 11 years ago

Hi,

PCCA+ should be fairly slow for 20 macrostates. Why not start at 4 or 5 and see how the timings change as you increase the number of states.

I suspect a couple hours for 20 might be reasonable, but it depends on the details of your machine and microstate.

On 04/30/2013 01:15 PM, nanjie wrote:

Kyle, when I use much smaller macrostates (e.g. 20), using PCCA the job can finish without printing error. But use of PCCA+ still running and printing out following warning: {'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 20, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_20.PCCA+/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 16:06:38 - Creating directory Macro_20.PCCA+ 16:06:38 - Running PCCA+... 16:06:50 - Minimizing PCCA+ objective function. 16:06:50 - Initial value of objective function: f = -13.212701 Warning: Maximum number of iterations exceeded.

The PCCA+ job is still running after some time now and not returning, does this indicates an error?

Nanjie

On Tue, Apr 30, 2013 at 1:08 PM, kyleabeauchamp notifications@github.comwrote:

I would definitely pick fewer macrostates. I just don't think PCCA and PCCA+ were designed to anything but extract a handful of slow states.

I'm not sure why you should get an error, though. Try a smaller model and see what happens.

On 04/30/2013 10:02 AM, nanjie wrote:

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 115, in do_minimization=args.do_minimization) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 223, in init self.lump(do_minimization=do_minimization) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 249, in lump A = self.optimize_A(A) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, **opts) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 424, in _minimize_neldermead fsim[0] = func(x0) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 268, in function_wrapper return function(x, *args) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

— Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240030.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240376 .

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251404.

nanjie commented 11 years ago

I see, that's very helpful to know. I am hesitant to go even lower resolution, as I am afraid it might mistakenly lump native state with unfolded state.

Also, using PCCA rather than PCCA+, I found that on 20-node macro model, the slowest implied timescale is 2-3 times faster than the corresponding mode calculated using the 31,000 msm, a sign of lumping error. other modes are also faster by similar ratio.

Would PCCA+ produce lumping with smaller error?

Thanks, Nanjie

On Tue, Apr 30, 2013 at 4:18 PM, kyleabeauchamp notifications@github.comwrote:

Hi,

PCCA+ should be fairly slow for 20 macrostates. Why not start at 4 or 5 and see how the timings change as you increase the number of states.

I suspect a couple hours for 20 might be reasonable, but it depends on the details of your machine and microstate.

On 04/30/2013 01:15 PM, nanjie wrote:

Kyle, when I use much smaller macrostates (e.g. 20), using PCCA the job can finish without printing error. But use of PCCA+ still running and printing out following warning: {'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 20, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_20.PCCA+/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 16:06:38 - Creating directory Macro_20.PCCA+ 16:06:38 - Running PCCA+... 16:06:50 - Minimizing PCCA+ objective function. 16:06:50 - Initial value of objective function: f = -13.212701 Warning: Maximum number of iterations exceeded.

The PCCA+ job is still running after some time now and not returning, does this indicates an error?

Nanjie

On Tue, Apr 30, 2013 at 1:08 PM, kyleabeauchamp notifications@github.comwrote:

I would definitely pick fewer macrostates. I just don't think PCCA and PCCA+ were designed to anything but extract a handful of slow states.

I'm not sure why you should get an error, though. Try a smaller model and see what happens.

On 04/30/2013 10:02 AM, nanjie wrote:

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 115, in do_minimization=args.do_minimization) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 223, in init self.lump(do_minimization=do_minimization) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 249, in lump A = self.optimize_A(A) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, **opts) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 424, in _minimize_neldermead fsim[0] = func(x0) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 268, in function_wrapper return function(x, *args) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

— Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240030.

— Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240376>

.

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251404.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251532 .

kyleabeauchamp commented 11 years ago

I'd say a 2-3X acceleration is probably the usual amount of error introduced with PCCA(+) type methods.

On 04/30/2013 01:33 PM, nanjie wrote:

I see, that's very helpful to know. I am hesitant to go even lower resolution, as I am afraid it might mistakenly lump native state with unfolded state.

Also, using PCCA rather than PCCA+, I found that on 20-node macro model, the slowest implied timescale is 2-3 times faster than the corresponding mode calculated using the 31,000 msm, a sign of lumping error. other modes are also faster by similar ratio.

Would PCCA+ produce lumping with smaller error?

Thanks, Nanjie

On Tue, Apr 30, 2013 at 4:18 PM, kyleabeauchamp notifications@github.comwrote:

Hi,

PCCA+ should be fairly slow for 20 macrostates. Why not start at 4 or 5 and see how the timings change as you increase the number of states.

I suspect a couple hours for 20 might be reasonable, but it depends on the details of your machine and microstate.

On 04/30/2013 01:15 PM, nanjie wrote:

Kyle, when I use much smaller macrostates (e.g. 20), using PCCA the job can finish without printing error. But use of PCCA+ still running and printing out following warning: {'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 20, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_20.PCCA+/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 16:06:38 - Creating directory Macro_20.PCCA+ 16:06:38 - Running PCCA+... 16:06:50 - Minimizing PCCA+ objective function. 16:06:50 - Initial value of objective function: f = -13.212701 Warning: Maximum number of iterations exceeded.

The PCCA+ job is still running after some time now and not returning, does this indicates an error?

Nanjie

On Tue, Apr 30, 2013 at 1:08 PM, kyleabeauchamp notifications@github.comwrote:

I would definitely pick fewer macrostates. I just don't think PCCA and PCCA+ were designed to anything but extract a handful of slow states.

I'm not sure why you should get an error, though. Try a smaller model and see what happens.

On 04/30/2013 10:02 AM, nanjie wrote:

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 115, in do_minimization=args.do_minimization) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 223, in init self.lump(do_minimization=do_minimization) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 249, in lump A = self.optimize_A(A) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, **opts) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 424, in _minimize_neldermead fsim[0] = func(x0) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 268, in function_wrapper return function(x, *args) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

— Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240030.

— Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240376>

.

— Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251404.

— Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251532 .

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17252293.

nanjie commented 11 years ago

I see. Yes, PCCA+ calculation finished OK, with acceleration ratio of 2, which is better than 3 obtained with PCCA. Thanks.

On Tue, Apr 30, 2013 at 4:38 PM, kyleabeauchamp notifications@github.comwrote:

I'd say a 2-3X acceleration is probably the usual amount of error introduced with PCCA(+) type methods.

On 04/30/2013 01:33 PM, nanjie wrote:

I see, that's very helpful to know. I am hesitant to go even lower resolution, as I am afraid it might mistakenly lump native state with unfolded state.

Also, using PCCA rather than PCCA+, I found that on 20-node macro model, the slowest implied timescale is 2-3 times faster than the corresponding mode calculated using the 31,000 msm, a sign of lumping error. other modes are also faster by similar ratio.

Would PCCA+ produce lumping with smaller error?

Thanks, Nanjie

On Tue, Apr 30, 2013 at 4:18 PM, kyleabeauchamp notifications@github.comwrote:

Hi,

PCCA+ should be fairly slow for 20 macrostates. Why not start at 4 or 5 and see how the timings change as you increase the number of states.

I suspect a couple hours for 20 might be reasonable, but it depends on the details of your machine and microstate.

On 04/30/2013 01:15 PM, nanjie wrote:

Kyle, when I use much smaller macrostates (e.g. 20), using PCCA the job can finish without printing error. But use of PCCA+ still running and printing out following warning: {'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 20, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_20.PCCA+/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 16:06:38 - Creating directory Macro_20.PCCA+ 16:06:38 - Running PCCA+... 16:06:50 - Minimizing PCCA+ objective function. 16:06:50 - Initial value of objective function: f = -13.212701 Warning: Maximum number of iterations exceeded.

The PCCA+ job is still running after some time now and not returning, does this indicates an error?

Nanjie

On Tue, Apr 30, 2013 at 1:08 PM, kyleabeauchamp notifications@github.comwrote:

I would definitely pick fewer macrostates. I just don't think PCCA and PCCA+ were designed to anything but extract a handful of slow states.

I'm not sure why you should get an error, though. Try a smaller model and see what happens.

On 04/30/2013 10:02 AM, nanjie wrote:

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 115, in do_minimization=args.do_minimization) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 223, in init self.lump(do_minimization=do_minimization) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 249, in lump A = self.optimize_A(A) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, **opts) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 424, in _minimize_neldermead fsim[0] = func(x0) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 268, in function_wrapper return function(x, *args) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240030.

Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240376>

.

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251404.

Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251532>

.

Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17252293.

Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17252581 .

nanjie commented 11 years ago

Hi, Kyle,

Now that I have 20-state msm which is generated from a 31,000 msm by pcca+, I wish to obtain the tProb matrix for the 20-state msm at some lagtime (say 10 ns). what is is command to compute the transition probability matrix for the macro- msm? Do I type BuildMsm first?

Thanks,

Nanjie

On Tue, Apr 30, 2013 at 5:07 PM, Nanjie Deng nanjie.deng@gmail.com wrote:

I see. Yes, PCCA+ calculation finished OK, with acceleration ratio of 2, which is better than 3 obtained with PCCA. Thanks.

On Tue, Apr 30, 2013 at 4:38 PM, kyleabeauchamp notifications@github.comwrote:

I'd say a 2-3X acceleration is probably the usual amount of error introduced with PCCA(+) type methods.

On 04/30/2013 01:33 PM, nanjie wrote:

I see, that's very helpful to know. I am hesitant to go even lower resolution, as I am afraid it might mistakenly lump native state with unfolded state.

Also, using PCCA rather than PCCA+, I found that on 20-node macro model, the slowest implied timescale is 2-3 times faster than the corresponding mode calculated using the 31,000 msm, a sign of lumping error. other modes are also faster by similar ratio.

Would PCCA+ produce lumping with smaller error?

Thanks, Nanjie

On Tue, Apr 30, 2013 at 4:18 PM, kyleabeauchamp notifications@github.comwrote:

Hi,

PCCA+ should be fairly slow for 20 macrostates. Why not start at 4 or 5 and see how the timings change as you increase the number of states.

I suspect a couple hours for 20 might be reasonable, but it depends on the details of your machine and microstate.

On 04/30/2013 01:15 PM, nanjie wrote:

Kyle, when I use much smaller macrostates (e.g. 20), using PCCA the job can finish without printing error. But use of PCCA+ still running and printing out following warning: {'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 20, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_20.PCCA+/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 16:06:38 - Creating directory Macro_20.PCCA+ 16:06:38 - Running PCCA+... 16:06:50 - Minimizing PCCA+ objective function. 16:06:50 - Initial value of objective function: f = -13.212701 Warning: Maximum number of iterations exceeded.

The PCCA+ job is still running after some time now and not returning, does this indicates an error?

Nanjie

On Tue, Apr 30, 2013 at 1:08 PM, kyleabeauchamp notifications@github.comwrote:

I would definitely pick fewer macrostates. I just don't think PCCA and PCCA+ were designed to anything but extract a handful of slow states.

I'm not sure why you should get an error, though. Try a smaller model and see what happens.

On 04/30/2013 10:02 AM, nanjie wrote:

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py", line 1245, in run_script execfile(script_filename, namespace, namespace) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 115, in do_minimization=args.do_minimization) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 223, in init self.lump(do_minimization=do_minimization) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 249, in lump A = self.optimize_A(A) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, **opts) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 424, in _minimize_neldermead fsim[0] = func(x0) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 268, in function_wrapper return function(x, *args) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240030.

Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240376>

.

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251404.

Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251532>

.

Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17252293.

Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17252581 .

kyleabeauchamp commented 11 years ago

Yes, BuildMSM.py is the correct command. You just need to tell it to use the output of PCCA+ as its assignments (MacroAssignments.h5)

Assuming your trajectory data is stored every nanosecond, your command should look something like

BuildMSM.py -l 10 -a MacroModel/MacroAssignments.h5 -o MacroModel/

On 05/01/2013 12:50 PM, nanjie wrote:

Hi, Kyle,

Now that I have 20-state msm which is generated from a 31,000 msm by pcca+, I wish to obtain the tProb matrix for the 20-state msm at some lagtime (say 10 ns). what is is command to compute the transition probability matrix for the macro- msm? Do I type BuildMsm first?

Thanks,

Nanjie

On Tue, Apr 30, 2013 at 5:07 PM, Nanjie Deng nanjie.deng@gmail.com wrote:

I see. Yes, PCCA+ calculation finished OK, with acceleration ratio of 2, which is better than 3 obtained with PCCA. Thanks.

On Tue, Apr 30, 2013 at 4:38 PM, kyleabeauchamp notifications@github.comwrote:

I'd say a 2-3X acceleration is probably the usual amount of error introduced with PCCA(+) type methods.

On 04/30/2013 01:33 PM, nanjie wrote:

I see, that's very helpful to know. I am hesitant to go even lower resolution, as I am afraid it might mistakenly lump native state with unfolded state.

Also, using PCCA rather than PCCA+, I found that on 20-node macro model, the slowest implied timescale is 2-3 times faster than the corresponding mode calculated using the 31,000 msm, a sign of lumping error. other modes are also faster by similar ratio.

Would PCCA+ produce lumping with smaller error?

Thanks, Nanjie

On Tue, Apr 30, 2013 at 4:18 PM, kyleabeauchamp notifications@github.comwrote:

Hi,

PCCA+ should be fairly slow for 20 macrostates. Why not start at 4 or 5 and see how the timings change as you increase the number of states.

I suspect a couple hours for 20 might be reasonable, but it depends on the details of your machine and microstate.

On 04/30/2013 01:15 PM, nanjie wrote:

Kyle, when I use much smaller macrostates (e.g. 20), using PCCA the job can finish without printing error. But use of PCCA+ still running and printing out following warning: {'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 20, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_20.PCCA+/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 16:06:38 - Creating directory Macro_20.PCCA+ 16:06:38 - Running PCCA+... 16:06:50 - Minimizing PCCA+ objective function. 16:06:50 - Initial value of objective function: f = -13.212701 Warning: Maximum number of iterations exceeded.

The PCCA+ job is still running after some time now and not returning, does this indicates an error?

Nanjie

On Tue, Apr 30, 2013 at 1:08 PM, kyleabeauchamp notifications@github.comwrote:

I would definitely pick fewer macrostates. I just don't think PCCA and PCCA+ were designed to anything but extract a handful of slow states.

I'm not sure why you should get an error, though. Try a smaller model and see what happens.

On 04/30/2013 10:02 AM, nanjie wrote:

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File

"/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py",

line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py",

line 1245, in run_script execfile(script_filename, namespace, namespace) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 115, in do_minimization=args.do_minimization) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 223, in init self.lump(do_minimization=do_minimization) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 249, in lump A = self.optimize_A(A) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, **opts) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 424, in _minimize_neldermead fsim[0] = func(x0) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 268, in function_wrapper return function(x, *args) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240030.

Reply to this email directly or view it on GitHub<

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240376>

.

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251404.

Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251532>

.

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17252293.

Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17252581 .

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17302272.

nanjie commented 11 years ago

Thanks so much Kyle.

On Wed, May 1, 2013 at 3:55 PM, kyleabeauchamp notifications@github.comwrote:

Yes, BuildMSM.py is the correct command. You just need to tell it to use the output of PCCA+ as its assignments (MacroAssignments.h5)

Assuming your trajectory data is stored every nanosecond, your command should look something like
BuildMSM.py -l 10 -a MacroModel/MacroAssignments.h5 -o MacroModel/
On 05/01/2013 12:50 PM, nanjie wrote:

Hi, Kyle,

Now that I have 20-state msm which is generated from a 31,000 msm by pcca+, I wish to obtain the tProb matrix for the 20-state msm at some lagtime (say 10 ns). what is is command to compute the transition probability matrix for the macro- msm? Do I type BuildMsm first?

Thanks,

Nanjie

On Tue, Apr 30, 2013 at 5:07 PM, Nanjie Deng nanjie.deng@gmail.com wrote:

I see. Yes, PCCA+ calculation finished OK, with acceleration ratio of 2, which is better than 3 obtained with PCCA. Thanks.

On Tue, Apr 30, 2013 at 4:38 PM, kyleabeauchamp notifications@github.comwrote:

I'd say a 2-3X acceleration is probably the usual amount of error introduced with PCCA(+) type methods.

On 04/30/2013 01:33 PM, nanjie wrote:

I see, that's very helpful to know. I am hesitant to go even lower resolution, as I am afraid it might mistakenly lump native state with unfolded state.

Also, using PCCA rather than PCCA+, I found that on 20-node macro model, the slowest implied timescale is 2-3 times faster than the corresponding mode calculated using the 31,000 msm, a sign of lumping error. other modes are also faster by similar ratio.

Would PCCA+ produce lumping with smaller error?

Thanks, Nanjie

On Tue, Apr 30, 2013 at 4:18 PM, kyleabeauchamp notifications@github.comwrote:

Hi,

PCCA+ should be fairly slow for 20 macrostates. Why not start at 4 or 5 and see how the timings change as you increase the number of states.

I suspect a couple hours for 20 might be reasonable, but it depends on the details of your machine and microstate.

On 04/30/2013 01:15 PM, nanjie wrote:

Kyle, when I use much smaller macrostates (e.g. 20), using PCCA the job can finish without printing error. But use of PCCA+ still running and printing out following warning: {'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 20, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_20.PCCA+/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 16:06:38 - Creating directory Macro_20.PCCA+ 16:06:38 - Running PCCA+... 16:06:50 - Minimizing PCCA+ objective function. 16:06:50 - Initial value of objective function: f = -13.212701 Warning: Maximum number of iterations exceeded.

The PCCA+ job is still running after some time now and not returning, does this indicates an error?

Nanjie

On Tue, Apr 30, 2013 at 1:08 PM, kyleabeauchamp notifications@github.comwrote:

I would definitely pick fewer macrostates. I just don't think PCCA and PCCA+ were designed to anything but extract a handful of slow states.

I'm not sure why you should get an error, though. Try a smaller model and see what happens.

On 04/30/2013 10:02 AM, nanjie wrote:

Hi, Kyle,

I was trying to generate a 500 state msm from a 31,000 state msm using pcca+, but got the following error after 50 minutes computing time, could you take a look and tell me what went wrong? Should I pick a smaller number of macrostates? The 31,000 state msm looks fine, as indicated by its implied time scales. So the error must be from pcca+, I guess.

Thanks, Nanjie

{'algorithm': 'PCCA+', 'assignments': 'Data/Assignments.Fixed.h5', 'do_minimization': True, 'flux_cutoff': None, 'num_macrostates': 500, 'objective_function': 'crisp_metastability', 'output_dir': 'Macro_500/', 'quiet': False, 'tProb': 'Data/tProb.mtx'} 12:13:27 - Creating directory Macro_1000 12:13:27 - Running PCCA+... 12:22:19 - Minimizing PCCA+ objective function. 12:22:21 - Initial value of objective function: f = inf Warning: Maximum number of iterations exceeded. Traceback (most recent call last): File "/home/ndeng/msm2.6.0/bin/PCCA.py", line 5, in pkg_resources.run_script('msmbuilder==2.6.0', 'PCCA.py') File

"/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py",

line 505, in run_script self.require(requires)[0].run_script(script_name, ns) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/pkg_resources.py",

line 1245, in run_script execfile(script_filename, namespace, namespace) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 115, in do_minimization=args.do_minimization) File

"/net/briareus/u2/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/PCCA.py",

line 51, in run_pcca_plus do_minimization=do_minimization, objective_function=objective_function) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 223, in init self.lump(do_minimization=do_minimization) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 249, in lump A = self.optimize_A(A) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 285, in optimize_A alpha = scipy.optimize.fmin(obj, alpha, full_output=True, xtol=1E-4, ftol=1E-4, maxfun=5000, maxiter=100000)[0] File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 360, in fmin res = _minimize_neldermead(func, x0, args, callback=callback, **opts) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 424, in _minimize_neldermead fsim[0] = func(x0) File

"/home/ndeng/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py",

line 268, in function_wrapper return function(x, *args) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 277, in obj = lambda x: -1 * self.objective_function(x, self.T, self.right_eigenvectors, square_map, self.populations) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 87, in crisp_metastability A, chi_fuzzy, mapping = calculate_fuzzy_chi(alpha, square_map, right_eigenvectors) File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 499, in calculate_fuzzy_chi A = to_square(alpha, square_map) # Convert parameter vector into matrix A File

"/home/ndeng/msm2.6.0/lib/python2.7/site-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/lumping/pcca_plus.py",

line 365, in to_square return alpha[square_map] IndexError: index 1 is out of bounds for size 1

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240030.

Reply to this email directly or view it on GitHub<

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17240376>

.

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251404.

Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17251532>

.

Reply to this email directly or view it on GitHub

https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17252293.

Reply to this email directly or view it on GitHub< https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17252581>

.

Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/183#issuecomment-17302272.

Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/183#issuecomment-17302643 .

nanjie commented 11 years ago

Hi, Kyle,

I want to compute the RMSDs between all the conformations in a given macrostate (after PCCA) relative to a reference structure, so the output will be a list of RMSD values. I guess I first need to use SaveStructures.py to save all the conformations in that macrostate into directory. How to tell SaveStructures to save all the conformations in the macrostate? The script seems only saves random conformations.

Also, to use CalculateRMSD.py for this purpose, it expects a reference structure and a set of macrostate structures. Should all the macrostate structures be saved as a single PDB or in separate PDBs?

Thanks,

Nanjie