Open sup27606 opened 7 years ago
First of all, thanks very much for your effort in making the interconversion between MD formats easier, and sharing your work with the community.
I fixed the parse_dihedral() errors by editing desmond_parser.py, but now there are more errors. Failing in parse_torsion_torsion(), because the format of torsion ffio block is different in the 2016 version of Desmond cms than what the program expects. In the sample cms file from your distro, the torsion ffio line looks like this: 1 3 15 16 17 15 16 17 25 cmap 1 In my cms file, it is this: 1 2 7 8 9 7 8 9 24 1 cmap
This is a simple fix for now, but not sure how many other changes have to be made. The problem is, these errors don't occur immediately when the script starts running.
I am not an expert in the cms format or python. However, in hindsight, the problem is, you are hardcoding all your array variables, how they index the ffio block parameters. However, cms (as far as I understand) is a flexible format. What you should really do is read the mapping of the cms block from the start of a ffio block, for example: ffio_torsion_torsion[722] { i_ffio_ai i_ffio_aj i_ffio_ak i_ffio_al i_ffio_am i_ffio_an i_ffio_ao i_ffio_ap i_ffio_c1 s_ffio_funct
and map your array variables accordingly. That way, the code won't break everytime the ordering of parameters in the ffio block changes.
Do you have any updated version of your software that takes care of the problems I am facing? Thank you.
Thank you @sup27606 for digging into this. What you are suggesting sounds like a very reasonable (and more correct) way to handle these ffio blocks.
Just to confirm, do these changes fix the problem you were having or is there still an energy discrepancy?
Hi Christoph, The conversion worked without further errors and the desmond energies are being written, but the gromacs energy is failing. It seems gmx is finding some problems with the gro file. Here is the output from grompp:
Program gmx grompp, VERSION 5.1 Source code file: /projects/Biogroup/GROMACS/gromacs-5.1.0-desktop/gromacs-5.1/src/gromacs/fileio/confio.c, line: 1048
Fatal error: Something is wrong in the coordinate formatting of file test.gro. Note that gro is fixed format (see the manual) For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors
I opened the gro file in VMD and it seems like some of the hydrogen coordinates look wrong (screenshot attached). Also, the x,y,z coordinates in the test.gro file has 12 decimal points, whereas my other gro files all have 3 decimal points. Could this be the issue, since gro is supposed to be a fixed format according to online resources? http://manual.gromacs.org/online/gro.html
Thanks. Supriyo test.gro.zip
Gro is a fixed file format but you can change the precision as stated in the link you provided:
Optionally (for now only yet with trjconv) you can write gro files with any number of decimal places, the format will then be n+5 positions with n decimal places (n+1 for velocities) in stead of 8 with 3 (with 4 for velocities). Upon reading, the precision will be inferred from the distance between the decimal points (which will be n+5)
Coordinate conversion errors have been an extremely rare issue though so this is interesting. I won't have time to dig deeper until the weekend. @mrshirts have you seen this before? Any immediate red flags?
@ctk3b, I think I know what's the problem, but I may be wrong. The flexible format applies only to the coordinates and velocities, but the first four records, ie. residue number, residue name, atom name, atom number are still fixed format. 5 positions each for these four records. Here is a small portion of the gro file produced by the program:
279TYR H9999 9999 -0.715489500000 0.923163500000 3.138024100000 279TYR H1000010000 -0.775758900000 1.171353300000 3.472607400000 279TYR H1000110001 -0.765668700000 0.715371900000 3.281781800000
The first line conforms to the gro format. However, in the second and third line, atom name exceeds 5 positions (H10000, H10001). If you rename the atoms such that the length of the atom name does not exceed 5 characters, I think it would solve the problem.
The first line conforms to the gro format. However, in the second and third line, atom name exceeds 5 positions (H10000, H10001). If you rename the atoms such that the length of the atom name does not exceed 5 characters, I think it would solve the problem.
That seems to be the problem. FWIW, here's how ParmEd handles this
@swails, thanks for the pointer. In the grofile_parser.py, it is doing something like that, i.e. limiting the field lengths to 5 characters, but somewhere there may be a bug. I don't have enough python knowledge to figure out. Here is the snippet where it seems to be doing that.
Here is the snippet where it seems to be doing that.
That will work for atom numbers, but not residue numbers.
Observe:
>>> len('{:5d}'.format(1))
5
>>> len('{:5d}'.format(1000000))
7
{:5d}
just means that the resulting field will always have at least 5 characters (with leading spaces filling out any that are not used for numbers). However, unlike Fortran (which will not allow a field to be overfilled, and will fill a field with ***'s instead if a number is too large to fit in a given field), Python will simply use as many characters as it needs.
The way to fix that snippet is to replace
gro.write('{0:5d}{1:<5s}{2:5s}{3:5d}'.format(
atom.residue_index, atom.residue_name, atom.name, (n + 1)%100000))
with
gro.write('{0:5d}{1:<5s}{2:5s}{3:5d}'.format(
atom.residue_index%100000, atom.residue_name, atom.name, (n + 1)%100000))
Hi, all-
Thanks for the work on this -- I have some deadlines, but will work on this more tonight. We generally try to use the flexible formatting of the CMS, but may have missed it in one point. Cmap particularly was kludged in, so I'm not surprised there is an issue there (Desmond keeps changing how it handles things).
Hi, sup27606-
I see the issues here, but other than the .gro file issue, it might take a week or so to carve out the time to fix it right in the general case. For your particular problem, do you have enough information to fix the installation for the particular .cms version you have? Thanks for the help in identifying issues for the general case.
Hi @ctk3b , @mrshirts , The fix suggested by @swails corrected for the atom indices, but the atom names still caused problems, whenever they were longer than 5 characters. I replaced the relevant line with this:
gro.write('{0:5d}{1:<5s}{2:5s}{3:5d}'.format( atom.residue_index%100000, atom.residue_name[0:5], atom.name[0:5], (n + 1)%100000))
It is running now and producing energy output. There are some differences in energy between desmond vs gromacs. Are these differences reasonable? I am using the default configs from Intermol. Thanks.
type input(desmond) output (gromacs) diff (gromacs)
coulomb (LR) 0.00000000 9594.75683594 9594.75683594
coulomb (SR) 0.00000000 -1132896.87500000 -1132896.87500000
coulomb total -701198.87919120 -987112.35253906 -285913.47334786
coulomb-14 136189.76776880 136189.76562500 -0.00214380
improper 1503.64665220 1503.66784668 0.02119448
proper 16113.43920960 15793.74607849 -319.69313111
urey-bradley 0.00000000 11922.68359375 11922.68359375
vdw (LR) 0.00000000 -7635.53271484 -7635.53271484
vdw (SR) 0.00000000 62529.02343750 62529.02343750
vdw total 70507.26184368 62872.06787109 -7635.19397259
vdw-14 7978.54248528 7978.57714844 0.03466316
angle 23509.59789000 26668.10546875 3158.50757875
bond 12581.90229488 9423.29199219 -3158.61030269
bonded 53708.58604668 53388.81138611 -319.77466057
dihedral 17617.08586180 17297.41392517 -319.67193663
nonbonded -630691.61734752 -924240.28466797 -293548.66732045
potential -875056.34542400 -870851.50000000 4204.84542400
Coulomb-14, LJ-14 are good.
Note that the bonded differences equal to the difference dihedrals. It looks like angle and bond are having issues with the Urey-Bradley terms being written in the wrong category, but that error cancels -- that is a bookkeeping error, not a conversion error.
But the dihedrals are off. This needs to be addressed.
"Nonbonded" being off is not really an issue - that's a bookkeeping error that apparently isn't possible to fix right now -- I need to make it more clear.
Potential is probably about right.
However, thought several of these are issues we fixed on a commit on March 13th. When was the version that you are using downloaded?
I have some code fixes for the other issues, but am still debugging them - hopefully can get a pull request in the next few days.
Thanks for the assessment. Although the relative error in the total potential is small, it is about one order of magnitude greater than the error in dihedral. So there are some differences in the nonbond besides the dihedral that are contributing to this difference in total potential. Is this a systemic error or a random error that is expected because of truncation of decimal places (angstrom to nm)?
I downloaded the code last week from github.
Is this a systemic error or a random error that is expected because of truncation of decimal places (angstrom to nm)?
Because the nonbonded cutoffs are practically impossible to match up exactly. if the LJ and Coul 1-4 interactions are this close, it means the translation of nonbondeds is correct - but you can't match the treatment of the long-range interactions precisely between programs.
I downloaded the code last week from github.
Hmm, let me keep looking at this.
Thank you very much. I plan to use the converted trajectories mainly for short range protein-ligand interaction energy. That way, the energy values should be more comparable.
Desmond to gromacs conversion is failing with the following error: Traceback (most recent call last): File "/Users/supriyo/data/PostDoc Stuff/codes/InterMol/intermol/convert.py", line 811, in
main()
File "/Users/supriyo/data/PostDoc Stuff/codes/InterMol/intermol/convert.py", line 226, in main
system, prefix = _load_desmond(cms_in=cms_in)
File "/Users/supriyo/data/PostDoc Stuff/codes/InterMol/intermol/convert.py", line 589, in _load_desmond
system = des.load(cms_file=cms_in)
File "/Library/Python/2.7/site-packages/intermol/desmond/desmond_parser.py", line 39, in load
return parser.read()
File "/Library/Python/2.7/site-packages/intermol/desmond/desmond_parser.py", line 1141, in read
self.load_ffio_block(molname, self.ffio_blockpos[i], self.fmct_blockpos[i+1]-1)
File "/Library/Python/2.7/site-packages/intermol/desmond/desmond_parser.py", line 903, in load_ffio_block
self.sysDirectivetype
File "/Library/Python/2.7/site-packages/intermol/desmond/desmond_parser.py", line 698, in parse_dihedrals
kwds = [float(x) for x in split[6:14]]
ValueError: could not convert string to float: "B5SP"
More precisely, it is failing at the following lines in the cms file:
grep B5SP cbFDH_formate_vel5_2-out.cms 9 11294 11297 11298 11299 Proper_Trig 0.000000 1.300000 "B5SP" -0.600000 0.650000 0.050000 0.000000 0.000000 0.000000 101 11317 11320 11321 11322 Proper_Trig 0.000000 1.300000 "B5SP" -0.600000 0.650000 0.050000 0.000000 0.000000 0.000000 199 11369 11372 11373 11374 Proper_Trig 0.000000 1.300000 "B5SP" -0.600000 0.650000 0.050000 0.000000 0.000000 0.000000 291 11392 11395 11396 11397 Proper_Trig 0.000000 1.300000 "B5SP" -0.600000 0.650000 0.050000 0.000000 0.000000 0.000000
The cms file contains strings with quotations within the dihedral section. The parse_dihedral module in desmond_parser.py is trying to interpret it as float ( kwds = [float(x) for x in split[6:14]] )
The command used is: python convert.py --des_in cbFDH_formate_vel5_2-out.cms --gromacs --oname test
cbFDH_formate_vel5_2-out.cms.zip