Closed germanbarcenas closed 2 years ago
Hi, can you try the trunk version of MDTraj and see if that helps? There was an issue with mdtraj.load
recently which is now fixed but not yet released.
Hi @clonker. I'm not entirely sure what you mean by 'trunk version' of MDTraj. Can you send a link or something to that in the documentation. Do you mean using mdtraj.load
rather than mdtraj.load_frame
?
Ah okay I misunderstood. You mean using the newest version of mdtraj? I'm currently using 1.9.6
Okay that is not the most recent version. I mean installing it from source like:
git clone https://github.com/mdtraj/mdtraj
cd mdtraj
pip install -e .
Does that change anything?
I upgraded the newest mdtraj with pip. It took me to 1.9.6 with pip insall mdtraj -U
. This did not fix the error, so next I tried your method. I'm getting some compile errors that I'm working through right now
I'm still on some compile error:
(moldym) C:\Users\GermanBarcenas\mdtraj>pip install -e .
Obtaining file:///C:/Users/GermanBarcenas/mdtraj
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: astunparse in c:\users\germanbarcenas\anaconda3\envs\moldym\lib\site-packages (from mdtraj==1.9.8.dev0) (1.6.3)
Requirement already satisfied: pyparsing in c:\users\germanbarcenas\anaconda3\envs\moldym\lib\site-packages (from mdtraj==1.9.8.dev0) (2.4.7)
Requirement already satisfied: numpy>=1.6 in c:\users\germanbarcenas\anaconda3\envs\moldym\lib\site-packages (from mdtraj==1.9.8.dev0) (1.22.2)
Requirement already satisfied: scipy in c:\users\germanbarcenas\anaconda3\envs\moldym\lib\site-packages (from mdtraj==1.9.8.dev0) (1.8.0)
Requirement already satisfied: wheel<1.0,>=0.23.0 in c:\users\germanbarcenas\anaconda3\envs\moldym\lib\site-packages (from astunparse->mdtraj==1.9.8.dev0) (0.37.0)
Requirement already satisfied: six<2.0,>=1.6.1 in c:\users\germanbarcenas\anaconda3\envs\moldym\lib\site-packages (from astunparse->mdtraj==1.9.8.dev0) (1.16.0)
Installing collected packages: mdtraj
Attempting uninstall: mdtraj
Found existing installation: mdtraj 1.9.6
Uninstalling mdtraj-1.9.6:
Successfully uninstalled mdtraj-1.9.6
Running setup.py develop for mdtraj
error: subprocess-exited-with-error
Γ python setup.py develop did not run successfully.
β exit code: 1
β°β> [22 lines of output]
C compiler:
Attempting to autodetect OpenMP support... Did not detect OpenMP support
running develop
running egg_info
writing mdtraj.egg-info\PKG-INFO
writing dependency_links to mdtraj.egg-info\dependency_links.txt
writing entry points to mdtraj.egg-info\entry_points.txt
writing requirements to mdtraj.egg-info\requires.txt
writing top-level names to mdtraj.egg-info\top_level.txt
reading manifest file 'mdtraj.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
C:\Users\GermanBarcenas\AppData\Local\Temp\pip-build-env-fv3xph7r\overlay\Lib\site-packages\setuptools\command\easy_install.py:158: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\Users\GermanBarcenas\AppData\Local\Temp\pip-build-env-fv3xph7r\overlay\Lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
no previously-included directories found matching '__pycache__'
no previously-included directories found matching '*.pyc'
adding license file 'LICENSE'
running build_ext
building 'mdtraj.formats.xtc' extension
error: command 'cl.exe' failed: None
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
WARNING: No metadata found in c:\users\germanbarcenas\anaconda3\envs\moldym\lib\site-packages
Rolling back uninstall of mdtraj
Moving to c:\users\germanbarcenas\anaconda3\envs\moldym\lib\site-packages\mdtraj-1.9.6.dist-info\
from C:\Users\GermanBarcenas\Anaconda3\envs\moldym\Lib\site-packages\~dtraj-1.9.6.dist-info
Moving to c:\users\germanbarcenas\anaconda3\envs\moldym\lib\site-packages\mdtraj\
from C:\Users\GermanBarcenas\Anaconda3\envs\moldym\Lib\site-packages\~dtraj
Moving to c:\users\germanbarcenas\anaconda3\envs\moldym\scripts\mdconvert-script.py
from C:\Users\GermanBarcenas\AppData\Local\Temp\pip-uninstall-9u66gpoj\mdconvert-script.py
Moving to c:\users\germanbarcenas\anaconda3\envs\moldym\scripts\mdconvert.exe
from C:\Users\GermanBarcenas\AppData\Local\Temp\pip-uninstall-9u66gpoj\mdconvert.exe
Moving to c:\users\germanbarcenas\anaconda3\envs\moldym\scripts\mdinspect-script.py
from C:\Users\GermanBarcenas\AppData\Local\Temp\pip-uninstall-9u66gpoj\mdinspect-script.py
Moving to c:\users\germanbarcenas\anaconda3\envs\moldym\scripts\mdinspect.exe
from C:\Users\GermanBarcenas\AppData\Local\Temp\pip-uninstall-9u66gpoj\mdinspect.exe
error: subprocess-exited-with-error
Γ python setup.py develop did not run successfully.
β exit code: 1
β°β> [22 lines of output]
C compiler:
Attempting to autodetect OpenMP support... Did not detect OpenMP support
running develop
running egg_info
writing mdtraj.egg-info\PKG-INFO
writing dependency_links to mdtraj.egg-info\dependency_links.txt
writing entry points to mdtraj.egg-info\entry_points.txt
writing requirements to mdtraj.egg-info\requires.txt
writing top-level names to mdtraj.egg-info\top_level.txt
reading manifest file 'mdtraj.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
C:\Users\GermanBarcenas\AppData\Local\Temp\pip-build-env-fv3xph7r\overlay\Lib\site-packages\setuptools\command\easy_install.py:158: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\Users\GermanBarcenas\AppData\Local\Temp\pip-build-env-fv3xph7r\overlay\Lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
no previously-included directories found matching '__pycache__'
no previously-included directories found matching '*.pyc'
adding license file 'LICENSE'
running build_ext
building 'mdtraj.formats.xtc' extension
error: command 'cl.exe' failed: None
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
apparently cl.exe
is not found, I've looked through this site:
https://stackoverflow.com/questions/41724445/python-pip-on-windows-command-cl-exe-failed
Now I have followed this but I'm still getting that error from cl.exe
That is curious. I assume something went wrong with your compiler installation. It seems like it doesn't find the linker (cl.exe), so perhaps your PATH isn't set up correctly. Perhaps this helps.
Thanks for the suggestion @clonker. I haven't had time to troubleshoot this problem. I did look at your link but I haven't had a chance to try their suggestion. I agree that cl.exe
path is not defined. I try to add to path but apparently I am doing it wrong.
Hi @germanbarcenas, does this issue concern only one trajectory or is it something you experience in general? Is it possible for you to send a small example script which reproduces the error?
The problem seems to be more general. The script is rather long, so I pasted some parts in my original post. Also installing from pip has been giving me issues. I finally got the dev version (1.9.8) in my conda environment, but I suspect that using pip and conda are causing some conflicts, because my python environment keeps crashing. Is there any other way to download the dev version from conda or conda-forge?
Getting it through conda is not possible unfortunately and I cannot reproduce the error locally. It would be helpful if you could write a small script with maybe a very small subset of your data (or other data for that matter) which reproduces this error.
@clonker Here is a small version of the code to help us troubleshoot. I don't want to post it all, just the parts that are giving me an issue. I don't want to have you dig around for unnecessary code. I'll also attach the topology and trajectory files. The .xtc file is rather large so maybe I could email this to you? Could that be the issue? I recently started to use chunksize=1000
to maybe help some memory issue, but I still get the same issue.
XTC = 'md_nojump.xtc'
PDB= 'hj_idt_ac-cl2.pdb'
def COMAngle_function(traj: mdtraj.Trajectory):
angles=[]
centers1 = c1.transform(traj) # yields ndarray
centers2 = c2.transform(traj) # yields ndarray
centers3 = c3.transform(traj) # yields ndarray
xyz = np.hstack((centers1, centers2, centers3))
traj = mdtraj.Trajectory(xyz.reshape(-1, 3, 3), topology=None)
# this has shape (n_frames, 1)
angles = mdtraj.compute_angles(traj, angle_indices=[[0, 1, 2]], periodic=True)
return angles
def COMDistance_function(traj: mdtraj.Trajectory):
distances=[]
centers4 = c4.transform(traj) # yields ndarray
centers5 = c5.transform(traj) # yields ndarray
xyz = np.hstack((centers4, centers5))
traj = mdtraj.Trajectory(xyz.reshape(-1, 2, 3), topology=None)
# this has shape (n_frames, 1)
distances = mdtraj.compute_distances(traj, [[0, 1]], periodic=True)
return distances
dyeName='SQ-Cl2'
Dye1Chain = "A" # Chain that the first dye is attached to (5th column of PDB file)
Dye2Chain = "C" # Chain that the second dye is attached to (5th column of PDB file)
dyeResidueName={'SQ-H2': 'SQA','SQ-Cl2':'SAC','SQ-Me2':'SM2','SQ-NO2':'SMA'
,'SQ-NMe2':'SDA', 'SQ-Sl2':'SS2', 'SQ-Sl3':'SS3' }
residueName=dyeResidueName[dyeName]
feat=pyemma.coordinates.featurizer(PDB)
topology = mdtraj.load_frame(XTC,0,top=PDB).topology
table, bonds = topology.to_dataframe()
terminalResidues={'DA3':'DA3','DG3':'DG3','DC3':'DC3','DT3':'DT3',
'DA5':'DA5','DG5':'DG5','DC5':'DC5','DT5':'DT5'}
terminalDataFrame=table.loc[table['resName'].isin(terminalResidues.values())]
terminalDataFrameFirstLast=terminalDataFrame.drop_duplicates(subset=['resName','chainID'])
dyeIndex=[residue.index for residue in topology.residues if residueName in str(residue)]
r1=topology.select("resid %i and name'C9'"%dyeIndex[0])+1
s1=topology.select("resid %i and name'C20'"%dyeIndex[0])+1
r2=topology.select("resid %i and name'C9'"%dyeIndex[1])+1
s2=topology.select("resid %i and name'C20'"%dyeIndex[1])+1
dye1=topology.select("resid %i" %dyeIndex[0])
dye2=topology.select("resid %i" %dyeIndex[1])
chainAStart=terminalDataFrameFirstLast.iat[0,3]
chainAEnd=terminalDataFrameFirstLast.iat[1,3]
chainAMid=int((chainAEnd-chainAStart)/2)
c1 = GroupCOMFeature(feat.topology, [[chainAStart+1]])
c2 = GroupCOMFeature(feat.topology, [[chainAMid]])
c3 = GroupCOMFeature(feat.topology, [[chainAEnd+1]])
c4 = GroupCOMFeature(feat.topology, [[dyeIndex[0]+0]])
c5 = GroupCOMFeature(feat.topology, [[dyeIndex[1]+0]])
feat.add_custom_func(COMAngle_function, dim=1, description='IDA_A')
feat.add_custom_func(COMDistance_function, dim=1, description='Dye Distance')
traj=pyemma.coordinates.load(XTC,features=feat,skip=10000)
cluster = pyemma.coordinates.cluster_kmeans(traj, k=50, max_iter=400, stride=1)
convergedlag=2500
mark = pyemma.msm.estimate_markov_model(cluster.dtrajs, lag=convergedlag, dt_traj='1 ps')
pcca_dist = mark.metastable_distributions
pcca_samples = mark.sample_by_distributions(pcca_dist, 50)
cluster_source = pyemma.coordinates.source(XTC, top=topology,skip=10000,chunksize=1000)
Project = "MSM"
dataPath=os.path.join(os.getcwd(), Project)
clusterDirectory='kmeanscluster'
clusterDirectoryPath=os.path.join(dataPath,clusterDirectory)
pyemma.coordinates.save_trajs(
cluster_source,
pcca_samples,
outfiles=[os.path.join(clusterDirectoryPath, 'pcca{}_10samples.pdb').format(n + 1)
for n in range(mark.n_metastable)],
verbose=True)
Perhaps just the code on its own will be enough to troubleshoot the original error. Please email me for the request of the .xtc file. I'm not sure how else to send a large zip file.
I tried this code locally with my own data and it works fine. Two things that I changed though (maybe they are related to your problem): In this line
cluster_source = pyemma.coordinates.source(XTC, top=topology,skip=10000,chunksize=1000)
you use topology
as top
instead of XTC
- is it the same?
and here
outfiles=[os.path.join(clusterDirectoryPath, 'pcca{}_10samples.pdb').format(n + 1)
for n in range(mark.n_metastable)],
it's better to use f"pcca{n+1}_10samples.pdb"
.
As xtc is a Gromacs format, maybe it helps to check the file with Gromacs: In the command line, try gmx check -f your_file.xtc
, maybe that gives any hint if the file has issues? As a quick workaround, maybe saving the file in a different format helps for this incident?
I tried this code locally with my own data and it works fine. Two things that I changed though (maybe they are related to your problem): In this line
cluster_source = pyemma.coordinates.source(XTC, top=topology,skip=10000,chunksize=1000)
you use
topology
astop
instead ofXTC
- is it the same? and hereoutfiles=[os.path.join(clusterDirectoryPath, 'pcca{}_10samples.pdb').format(n + 1) for n in range(mark.n_metastable)],
it's better to use
f"pcca{n+1}_10samples.pdb"
.
Ahh yes I was trying to change some variables in the pyemma.coordinates.save_trajs
to see if those might be causing my error. Normally I have just been using the following:
cluster_source = pyemma.coordinates.source(XTC, features=feat,skip=10000)
As xtc is a Gromacs format, maybe it helps to check the file with Gromacs: In the command line, try
gmx check -f your_file.xtc
, maybe that gives any hint if the file has issues? As a quick workaround, maybe saving the file in a different format helps for this incident?
Is there a preferred file format rather than .xtc? GROMACS can also output .trr
that I believe pyemma can also read. Here is the output from gmx check
as you suggested.
Command line: gmx_mpi check -f md_nojump.xtc
Checking file md_nojump.xtc Reading frame 0 time 0.000
Atoms 3565
Precision 0.001 (nm) Reading frame 204000 time 2040000.000
Item #frames Timestep (ps) Step 204131 10 Time 204131 10 Lambda 0 Coords 204131 10 Velocities 0 Forces 0 Box 204131 10
GROMACS reminds you: "Either you will be dashed to atoms on crag points, or lifted up and borne by some master-wave into a calmer current" (Charlotte Bronte)
I'm not sure I see anything obviously wrong.
A quick update. I've reinstalled my conda environmet, and I've added the following line at the start of my code:
config.use_trajectory_lengths_cache = False
Also, I tried using a .trr
format as well. Nothing seems to work. Does anyone have any good directions as to what the error 13 is trying to say? Here is the trr error message for those curious.
Traceback (most recent call last):
File R:\GermanBarcenas\dna\gromacs\codes\markovAnalysis.py:986 in <module>
pyemma.coordinates.save_trajs(
File ~\Anaconda3\envs\fresh_pyemma\lib\site-packages\pyemma\coordinates\api.py:870 in save_trajs
save_traj(traj_inp, i_indexes, outfile, stride=stride, verbose=verbose)
File ~\Anaconda3\envs\fresh_pyemma\lib\site-packages\pyemma\coordinates\api.py:747 in save_traj
traj = frames_from_files(trajfiles, top, indexes, chunksize, stride, reader=reader)
File ~\Anaconda3\envs\fresh_pyemma\lib\site-packages\pyemma\coordinates\data\util\frames_from_file.py:138 in frames_from_files
collected_frames = [f for f in it]
File ~\Anaconda3\envs\fresh_pyemma\lib\site-packages\pyemma\coordinates\data\util\frames_from_file.py:138 in <listcomp>
collected_frames = [f for f in it]
File ~\Anaconda3\envs\fresh_pyemma\lib\site-packages\pyemma\coordinates\data\_base\datasource.py:1049 in __next__
X = self._use_cols(self._next_chunk())
File ~\Anaconda3\envs\fresh_pyemma\lib\site-packages\pyemma\coordinates\data\_base\datasource.py:1168 in _next_chunk
x = next(self._it)
File ~\Anaconda3\envs\fresh_pyemma\lib\site-packages\pyemma\coordinates\util\patches.py:203 in __next__
return next(self._ra_it)
File ~\Anaconda3\envs\fresh_pyemma\lib\site-packages\pyemma\coordinates\util\patches.py:243 in _random_access_generator
f.seek(seek_to, whence=1)
File mdtraj/formats/xtc/trr.pyx:727 in mdtraj.formats.trr.TRRTrajectoryFile.seek
RuntimeError: TRR seek error: 13
I think one more piece of information I might add is that this has worked for me in the past. The only difference from then and now is that I let me MD trajectory run for a longer time. Is it possible I'm running into some limit with respect to the size of my trajectory and my memory on my machine? Is the file possibly not closing because of some cache issues I might have? How might I be checking this? I suspect this might be a machine specific bug, so maybe the issue can be closed, but I would still appreciate some help on this.
For example, when I use the pyemma config options, I can see that:
config.traj_info_max_size: 500 config.traj_info_max_entries: 50000
I don't understand 100%, but perhaps my memory isn't big enough for the trajectory I'm reading. By the way, it is 3500000kb in size, so that's what I think might be causing issues.
No I don't think that is the issue. In the end though we can also build a workaround for saving these trajectories. Right now I am a bit busy but I will get back to you later with that script. Then perhaps it's clearer what goes wrong or if something goes wrong at all. I have never seen this error before so to be honest I am a bit stumped where it might come from.
One more thing you can try though: try using mdconvert to go from xtc to for example to h5
. This could give some more insight into what's going wrong.
Okay. It looks like I fixed it. First, I rolled back my .xtc to be 2.1 microseconds. I can probably push it more, but it seems like even my traj=pyemma.coordinates.load(XTC,features=feat,skip=10000)
was staring to have issues around 2.5. This is probably system dependent though. The thing that really seemed to cause the issue was actually the outfile
section of the save_trajs
. It turns out the directory to save the .pdb file was not initialized, and so it kept placing crashing on output. I just wish the error explained a little more.
Interesting, we should add a more descriptive error message then, preferably directly to mdtraj.
@germanbarcenas would you be willing to submit a PR that checks whether the output directory is available?
@clonker Sure! So, I'm a pretty fresh beginner to github. Can you give me an example of what I would say in the PR? Is it more of a request to make that error more explicitly, or should I offer some kind of solution as well?
Cool! PR means that you make a fork of the repository (there is a Fork
button at the top of this page), which is your own personal copy of PyEMMA. With that copy you can do whatever you want, for example you could go into the save_trajs
implementation and add a check if the output directory exists if it is set. That check you push onto your fork and then you can make a pull request against this repository, which basically is making a request to incorporate your changes into the codebase.
Here is a nice tutorial explaining all the necessary steps - let me know if you need help. :slightly_smiling_face:
@clonker I have this on my todo list(:
Hello,
I have a .xtc file that is being read by
traj=pyemma.coordinates.load(XTC,features=feat,skip=10000)
and it works just fine. Later on I perform some course graining and to extract the .pdb files for this, I do this:
When I create the
traj
variable everything works, including building the MSM. When I try thatpyemma.coordinates.save_trajs
line, I get the following error:This is confusing to me as the .xtc worked when being read by the initially. I saw an old issue ( https://github.com/markovmodel/PyEMMA/issues/306) that had a similar error. It was suggested that the .xtc file be closed. I still don't see documentation on this, and when I tried to to
traj.close()
, I got an error on close not working on nparrays. The only other time the .xtc is read is in this line:topology = mdtraj.load_frame(XTC,0,top=PDB).topology
but this also doesn't close. The Error 13 persists when I remove the skip=10000 tag on thecluster_source
line. Is there some method to close the .xtc file? I currently use pyemma v 2.5.9Thank you,