Closed s-gordon closed 11 years ago
The .lh5 files are corrupted somehow -- the error is coming from libhdf5, which is a dependency of a dependency of msmbuilder.
HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 139968538093312:
#000: ../../../src/H5Dio.c line 153 in H5Dread(): selection+offset not within extent
major: Dataspace
minor: Out of range
Do you have the command line program h5ls
? (If you're using ubuntu, you can install it with sudo apt-get hdf5-tools
. If you have enthough python, it should be installed by default.) h5ls
is basically ls
for the hdf5 format, so it should show you what data is in the file.
If so, can you try running
$ h5ls Trajectories/trj0.lh5
You can also try running this on one of the trajectories from the tutorial, and compare the output.
Another option is to try to do the conversion again. There could be a problem in your trajconv/catdcd pipeline. You might try using MDTraj's mdconvert: see http://rmcgibbo.github.io/mdtraj/ for details.
-Robert Sent from my iPhone.
On Sun, Jun 30, 2013 at 6:50 PM, gordo1 notifications@github.com wrote:
Hi all, After completing the MSMBUILDER2 tutorials, I tried to apply the same steps involved to my own data set of trajectories. After converting these frames to XTC (I was having issues with the DCD reader) using a combination of catdcd and Gromac's trjconv tools, I converted my trajectories into .lh5 files using the ConvertDataToHDF.py script, which appeared to complete normally. Next, I tried to cluster the data set using the command: python2.7 ../scripts/Cluster.py rmsd hybrid -d 0.045 -l 50
...which worked for the Tutorial data set previously. This spits out the following error log:
MSMBuilder version 2.6.0.dev-Unknown
See file AUTHORS for a list of MSMBuilder contributors.
Copyright 2011 Stanford University. MSMBuilder comes with ABSOLUTELY NO WARRANTY. MSMBuilder is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
Please cite the following references: GR Bowman, X Huang, and VS Pande. Methods 2009. Using generalized ensemble simulations and Markov state models to identify conformational states. KA Beauchamp, GR Bowman, TJ Lane, L Maibaum, IS Haque, VS Pande. JCTC 2011. MSMBuilder2: Modeling Conformational Dynamics at the Picosecond to Millisecond Timescale IS Haque, KA Beauchamp, VS Pande. In preparation.
A Fast 3 x N Matrix Multiply Routine for Calculation of Protein RMSD.
{'alg': 'hybrid', 'hybrid_distance_cutoff': 0.045, 'hybrid_global_iters': 0, 'hybrid_ignore_max_objective': False, 'hybrid_local_num_iters': 50, 'hybrid_num_clusters': None, 'hybrid_too_close_cutoff': 0.0001, 'metric': 'rmsd', 'output_dir': 'Data/', 'project': 'ProjectInfo.yaml', 'quiet': False, 'rmsd_atom_indices': 'AtomIndices.dat', 'stride': 1} 11:39:03 - RMSD metric - loading only the atom indices required HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 139968538093312:
000: ../../../src/H5Dio.c line 153 in H5Dread(): selection+offset not within extent
major: Dataspace minor: Out of range
Traceback (most recent call last): File "../scripts/Cluster.py", line 228, in
main(args, metric) File "../scripts/Cluster.py", line 203, in main trajs = load_trajectories(args.project, args.stride, atom_indices) File "../scripts/Cluster.py", line 121, in load_trajectories traj = project.load_traj(i, stride=stride, atom_indices=atom_indices) File "/usr/local/lib/python2.7/dist-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/project/project.py", line 340, in load_traj AtomIndices=atom_indices) File "/usr/local/lib/python2.7/dist-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py", line 722, in load_trajectory_file return Trajectory.load_from_lhdf(Filename, JustInspect=JustInspect, Stride=Stride, AtomIndices=AtomIndices) File "/usr/local/lib/python2.7/dist-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py", line 627, in load_from_lhdf A = cls.load_from_hdf(TrajFilename, Stride=Stride, AtomIndices=AtomIndices) File "/usr/local/lib/python2.7/dist-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py", line 596, in load_from_hdf chunk_list = list(cls.enum_chunks_from_hdf(TrajFilename, Stride=Stride, AtomIndices=AtomIndices, ChunkSize=ChunkSize)) File "/usr/local/lib/python2.7/dist-packages/msmbuilder-2.6.0-py2.7-linux-x86_64.egg/msmbuilder/Trajectory.py", line 507, in enum_chunks_from_hdf A['AtomID'] = np.array(F.root.AtomID[AtomIndices], dtype=np.int32) File "/usr/lib/python2.7/dist-packages/tables/array.py", line 689, in getitem arr = self._readCoords(coords) File "/usr/lib/python2.7/dist-packages/tables/array.py", line 792, in _readCoords self._g_readCoords(coords, nparr) File "hdf5Extension.pyx", line 1134, in tables.hdf5Extension.Array._g_readCoords (tables/hdf5Extension.c:9869) tables.exceptions.HDF5ExtError: Problems reading the array data. Closing remaining open files: /home/ /Downloads/msmbuilder-2.6.0_3/Tutorial/Trajectories/trj0.lh5... done At the conclusion of this, no files are created in the ./Data directory. I've farmed through the relevant scripts to try and diagnose what the issue might be, but nothing screams out. The trajectories involve only a small molecule (9 atoms), and the total data set is equivalent to roughly 50 Mb. I'd greatly appreciate it if anyone can help me figure out what the issue might be. Cheers.
Reply to this email directly or view it on GitHub: https://github.com/SimTk/msmbuilder/issues/217
It's very puzzling that you are able to convert the XTC files that come with MSMBuilder, but you are not able to convert the XTC files that your pipeline generates. Are you sure that your XTC files are properly formatted?
Thanks for the fast responses! rmcgibbo - I've just installed the hdf5-tools package and successfully tested h5ls as you described. The output using the tutorial data set is as follows:
AtomID Dataset {22/8192} AtomNames Dataset {22/16384} ChainID Dataset {22/65536} ResidueID Dataset {22/8192} ResidueNames Dataset {22/16384} XYZList Dataset {501, 22, 3}
When applying this to my own data set, I get the following output:
AtomID Dataset {9/8192} AtomNames Dataset {9/32768} ChainID Dataset {9/65536} ResidueID Dataset {9/8192} ResidueNames Dataset {9/16384} XYZList Dataset {5051, 9, 3}
...which matches up pretty well with what I got with the tutorial trajectory files.
MDTraj was my next point of reference. I'll give it a go and report back when I've got the results.
Do you have a single trajectory or several? Could it be that one trajectory is somehow corrupted?
If so, it might make sense to try loading the trajectories in an interactive python session, one by one.
R = Trajectory.load_from_lhdf("./Trajectories/trj0.lh5") etc
kyleabeauchamp - my thoughts exactly. At the moment I'm using catdcd to go from DCD -> TRR, then trjconvert to go from TRR -> XTC.
I haven't had a look at the internals of the XTC files yet.
I'm just following up on what you've suggested in your second comment. Will report back soon.
No, I've figured it out. Wait two seconds...
The problem is your AtomIndices.dat
contains too many indices.
rmcgibbo@Roberts-MacBook-Pro-2 ~
$ cat test.py
import tables
import numpy as np
handle = tables.openFile('test.h5', 'w')
# save ten numbers to the file
handle.createArray(where='/', name='x', object=np.arange(10))
# read the numbers back out, but try to overread the buffer
indices_to_grab = np.arange(100)
handle.root.x[indices_to_grab]
rmcgibbo@Roberts-MacBook-Pro-2 ~
$ python test.py
HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0:
#000: H5Dio.c line 153 in H5Dread(): selection+offset not within extent
major: Dataspace
minor: Out of range
Traceback (most recent call last):
File "test.py", line 11, in <module>
handle.root.x[indices_to_grab]
File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/tables/array.py", line 689, in __getitem__
arr = self._readCoords(coords)
File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/tables/array.py", line 792, in _readCoords
self._g_readCoords(coords, nparr)
File "hdf5Extension.pyx", line 1134, in tables.hdf5Extension.Array._g_readCoords (tables/hdf5Extension.c:9869)
tables.exceptions.HDF5ExtError: Problems reading the array data.
Closing remaining open files: test.h5... done
I'm sort of surprised that pytables doesn't catch this exception in a nicer way and report it as an IndexError, but that's the same one that you reported. Presumably there are numbers greater than 8 in your AtomIndices.dat
file?
Looks to be the case.
cat AtomIndices.dat 1 4 5 6 8 10 14 15 16 18
So the AtomIndices.dat
file is supposed to list the (zero-based) indices of the atoms that you want to use in the RMSD computation. So if you have exchangeable atoms like methyl hydrogens that you want to discard, you wouldn't list them in that file. I'm not sure what your system is, but presumably if there are only 9 atoms you probably want to include them all? In that case, the file should just list the integers zero through eight.
Thanks a million everyone. I've been struggling with this for a few days. Hard to believe it was something so simple.
I've amended AtomIndices.dat to reflect relevant atoms in my small molecule, and everything appears to be running smoothly now with clustering.
We will fix this error message at the next release to be more informative.
It's only easy because we live inside this codebase, so we know most of the failure modes. For the record, I checked mdtraj (which msmbuilder is going to use in the near future), and it gives an informative error message here.
Okay. I'm going to close this. Looks like the issue was resolved.
Hi all, I've been facing issues trying to use .h5 file datasets. I keep getting this error for the code ds = dataset('traj-0000.h5') len(ds)
/anaconda3/envs/MDS/lib/python2.7/site-packages/tables/group.pyc in _g_check_has_child(self, name)
396 raise NoSuchNodeError(
397 "group %s
does not have a child named %s
"
--> 398 % (self._v_pathname, name))
399 return node_type
400
NoSuchNodeError: group /
does not have a child named /arr_0
Even though the h5 file is in the same directory as the Jupyter notebook. Any help/fixes will be highly appreciated.
I think MSMBuilder doesn't account for the latest Pytables update. Here's a link to a Pytables [Similar issue] (https://github.com/PV-Lab/bayesim/issues/1)
Hi all,
After completing the MSMBUILDER2 tutorials, I tried to apply the same steps involved to my own data set of trajectories. After converting these frames to XTC (I was having issues with the DCD reader) using a combination of catdcd and Gromac's trjconv tools, I converted my trajectories into .lh5 files using the ConvertDataToHDF.py script, which appeared to complete normally.
Next, I tried to cluster the data set using the command:
At the conclusion of this, no files are created in the ./Data directory.
I've farmed through the relevant scripts to try and diagnose what the issue might be, but nothing screams out.
The trajectories involve only a small molecule (9 atoms), and the total data set is equivalent to roughly 50 Mb.
I'd greatly appreciate it if anyone can help me figure out what the issue might be.
Cheers.