selimsami / qforce

Apache License 2.0
57 stars 13 forks source link

Amber #62

Open mateuszanotto opened 1 year ago

mateuszanotto commented 1 year ago

Added amber file format support (.itp)

Added a progress bar while qforce is being initialized (importing modules)

(there are more commits than necessary 'cause I was still figuring it out how to do it properly)

selimsami commented 1 year ago

Hi Mateusz, The changes look very exciting, but the CI seems to be failing. Could you please see what's causing that? Thanks!

mateuszanotto commented 1 year ago

Oh, i will look into it!

mateuszanotto commented 1 year ago

I believe it was only missing the misc.py with the logo with # on front, for the .mol2 files

mateuszanotto commented 1 year ago

Hi Selim, Thank you so much for the comments! I will work on them right away Kind regards, Mateus

mateuszanotto commented 1 year ago

Hi Selim, I uploaded the requested changes: moved the ff.write(...) to the forcefield.py, removed unnecessary comments, removed Loading Bar from this PR and changed the styling to PEP8.

selimsami commented 1 year ago

Hi @mateuszanotto,

Thanks for all the changes! I went ahead and made a few more cosmetic changes myself. One thing that is worth mentioning is that I changed the keyword from "forcefield" to "output_software" as gromacs/amber are not really forcefields.

Additionally, I noticed a possibly important bug (see my comment above). After that's resolved, it think it is ready to be merged.

Cheers, Selim

selimsami commented 1 year ago

@mateuszanotto , as I commented above, please do not just remove the Urey-Bradley printing lines, this will yield an inaccurate FF.

selimsami commented 1 year ago

@mateuszanotto I would also suggest that you generate AMBER FFs for benzene and propane and compare vibrational modes/frequencies to GROMACS ones to make sure that your implementation is correct.

mateuszanotto commented 1 year ago

Hi @selimsami,

Thank you for the suggestion, I will run the vibrational modes/freqs to compare them with GROMACS.

Regarding the UB angles, I will investigate with Prof. Carles a bit more on how to use them on AMBER then.

Again, thank you for the suggestions and the patience.

selimsami commented 1 year ago

Hi @mateuszanotto , have you had a chance to look at this yet?

mateuszanotto commented 1 year ago

Hi @selimsami I tried a couple of ways, but running normal mode analysis on amber is not as straightforward as I hoped. Carles was busy at the beginning of the year and he had a chance to talk to me only last week. I got back to this on Monday and I'm focused on solving it soon.

mateuszanotto commented 1 year ago

I was reading the QForce paper SI and just to make sure, the normal mode analysis is done on the gas phase production MD?

selimsami commented 1 year ago

Hi Mateus, Good to hear that you're back at it. It is nice work and close to completion :) Yes, it is done on the gas phase.

mateuszanotto commented 1 year ago

Hello @selimsami, To give you a follow-up. The vib. freq. for the benzenes are close to the QM frequencies. It looks almost the same without Urey-Bradley's potential, so it's better to keep it. benzene For the propane though, the frequencies around 1500cm-1 (C-H bending) are pretty off. The only difference is the addition of flexible dihedrals, so the problem should be there. propane I'm currently using 4 terms. I will try calculating with all 6 terms just to make it sure, and also check if is not a problem when building the topology/trajectory.

Screenshot 2023-04-03 at 10 00 53
selimsami commented 1 year ago

Hi @mateuszanotto, thanks for the update!

Could you clarify for me please what have you done in the first case: 1) is UB term turned on in Q-Force? 2) have you implemented in your amber FF writer the UB term?

I am not sure what you mean by "better to keep it", keep what?

Regarding the propane case, I think your dihedral profile looks quite good already. I don't think that's the reason behind the mismatch, also because dihedral potentials typically affect frequencies below 500 cm-1. So I think what you see there is an issue with the UB potential!

It has been a while, but what I had suggested last time was:

Urey-Bradley is simply a combination of two potentials:
1) A harmonic angle potential, which you already specify in the FF.
2) A harmonic bond potential, between atoms 1 and 3 of that angle

So all you have to do is adding "single harmonic bond potentials" between atoms 1 and 3 of your angles, using the force constants and equilibrium bond lengths that Q-Force has determined.

Exactly the same way you define standard bond potentials!

Have you implemented this? Otherwise the mismatch you get is not surprising.

Cheers, Selim

mateuszanotto commented 1 year ago

Then the problem is probably at the UB term.

I implemented the UB terms only as a bond term between the 1 and 3 atoms, but I didn't add the angle term as I understood it was already being taken into account. I'll double-check the angles and tomorrow I give you a follow-up.

Thank you so much, Mateus

mateuszanotto commented 1 year ago

Hello @selimsami,

I confirmed that the UB terms were implemented right, the problem was the dihedral. I removed the 4 terms Ryckaert-Ballemans potential, leaving a single term and it gives vib. freq. close to the QM. propane

I followed exactly what the Amber manual says about how it calculates the Fourier series, but it's obviously not giving the expected results. I checked and gaff2 also uses a single term for propane dihedrals. Do you want me to make any other test, maybe with some molecule with cis and trans conformation?

selimsami commented 1 year ago

Hi @mateuszanotto , Great that the UB term is implemented! Do I get it right that you changed the dihedral function to just a single term dihedral? Unfortunately this won't work with anything but the simplest dihedrals like the propane. To have a good match between what Q-Force calculates and what you write in the FF file, I think we'd need to have either Bellemans or Fourier dihedral function. Perhaps we could figure out what went wrong there? What kind of a functional form did you use, and how did you convert Q-Force output to Fourier? Just as a reference, this is what is on GROMACS manual: Screenshot 2023-04-06 at 10 40 28 AM

mateuszanotto commented 1 year ago

Yeah, I understand why this won't work for more complex cases. It would be helpful if we tried to figure it out, thank you.

Amber uses this functional form:

Screenshot 2023-04-10 at 12 13 01

and it is written on the output as:

Screenshot 2023-04-10 at 12 09 43

I used the RB parameters (Cn) inside QForce as the Potential barrier. In this graph, I plotted the Etorsion on Amber form using Cn/2 vs Gromacs form using Cn.

Screenshot 2023-04-03 at 10 00 53

One thing that I did was to see how Amber prints the parameter file for propane using gaff2. It prints the dihedrals with a single term and improper dihedrals, which I may be missing. Do Fourier/RB dihedrals cover both cases?

selimsami commented 1 year ago

They seem to have figured out the conversion in Parmed:

https://github.com/ParmEd/ParmEd/pull/837/files

More specifically:

        phi = 0 * u.degrees
        fc0 = 1.0*c0 + 0.5*c2 + 0.3750*c4
        fc1 = 1.0*c1 + 0.75*c3 + 0.6250*c5
        fc2 = 0.5*c2 + 0.5*c4
        fc3 = 0.25*c3 + 0.3125*c5
        fc4 = 0.125*c4
        fc5 = 0.0625*c5

Perhaps this is something we can adapt?

mateuszanotto commented 1 year ago

I adapted the conversions as in Parmed and the results are better, but still off propaneRB

I noticed that QForce is printing the dihedrals H-C-C-C and C-C-C-H. I added manually the H-C-C-H dihedral and the vib. freq. get close to the QM reference. I think this may be what was missing. The .itp also doesn't print it, so I suppose Gromacs don't need it. Is there a way to retrieve this dihedral inside QForce? propaneHCCH

selimsami commented 1 year ago

Hi @mateuszanotto , Could you push the latest version of the code so I can have a look at it? Q-Force does not put a dihedral function in the non-primary direction of the dihedral (in this case: H-C-C-C), so I don't think that's the issue. And if it was, the same issue would also exist in GROMACS.

mateuszanotto commented 1 year ago

I just pushed the latest version. The case is that Amber's topology/coordinate generator asks for this non-primary direction.

Screenshot 2023-04-13 at 14 10 27

The program was taking the HCCC dihedral param. for this entry before. Though I understand that making Q-Force put dihedral functions for non-primary dihedrals is more complicated, right?

selimsami commented 1 year ago

3 questions:

  1. Maybe I'm missing something but why the c = dihed.equ/2 ?
  2. In the example I shared, equ = 0 always. Though I managed to reproduce the RB potential with equ = 180. Could you try those two options please?
  3. What happens if you add manually a dihedral function with no barrier (so no potential) to H-C-C-H ?
mateuszanotto commented 1 year ago

Hi @selimsami

Answering your questions: First, writing a function with no barrier for the H-C-C-H works quite well, so I did all the tests using it.

  1. Amber's dihedral function uses Vn/2 Screenshot 2023-04-14 at 10 47 01

    but internally it's passed as Vn/divider (the eq. 15.5 that I showed before).

The .frcmod files are being written with the divider=1 and the c = dihed.equ/2, but I can switch it so it's less confusing. Just to be sure I calculated the freq. using c = dihed.equ and div=1. This is the result. propaneCn1

  1. Using equ=0 changes mainly the first 2 vib. modes. I understand that this will affect if the molecule prefers the trans (phi=180) or cis (phi=0) conformation, which is not relevant for propane. I alternate them because the parameters are fitted for 1+cos(n phi)=cos(n phi-0) and 1-cos(n phi)=cos(n phi-180), is that correct? propanePhase0

  2. Finally, these are the vib. modes with a function with no barrier for the H-C-C-H. Here I'm using a Fourier with 6 terms converted as in Parmed, the alternating phases, c = dihed.equ, and div = 2. propaneHCCH0 I did it with c = dihed.equ/2, and div = 1, and it's the same.

selimsami commented 1 year ago

Hmm, I'm starting to think that we have more issues than just the dihedral potential.

Here are the results I get from GROMACS with and without a dihedral function.

With dihedral function: Screenshot 2023-04-14 at 11 03 08 AM

Without dihedral function: Screenshot 2023-04-14 at 11 03 11 AM

The mean absolute error (MAE) is 3.5% and 8% with and without the dih. function.

You see that the dihedral function only affects the lowest frequencies in my case. For you it seems to affect everything?

Even the C-H bond stretch freq. seems to be somewhat off in your case at ~3000 cm-1.

You could perhaps turn off all dihedral functions and show what kind of a plot and MAE you get?

Could you also share the final force field files you get that is printed by qforce?

mateuszanotto commented 1 year ago

Hi @selimsami,

These are the results I get from AMBER with and without a dihedral function.

With dihedral function (4 terms): dihedralfunc

Without dihedral function (potential=0): noDihe

The (MAE) is 9.2% and 6.6% with and without the dih. function. So using Fourier transforms increases the error. Indeed it affects the whole spectrum.

I'm usually calculating the frequency analysis on 5 frames as I'm using my notebook to do it. I have it running now with 50 frames to to rule out if it's not a statistical problem, but it should take a few days.

This is the force field file printed by qforce propane_qforce.txt

but in order to build the topology I have to write the dihedrals with wild cards: X- 1- 2- X [...] so X can be any atom. This works for simple molecules such as propane, but for more complex ones this is troublesome for the end user. A way to avoid this would be printing every single dihedral.

A default frcmod is like this: frcmod.txt so it has only one entry for each atom type interacion. If there are two C-C bonds with different parameters, one would overwrite the other.

selimsami commented 1 year ago

Hi Mateus, thanks for the update! What do you mean that you're running the frequency analysis on 5 frames? What is the procedure that you use to compute the vibrational frequencies? Typically, you would compute the numerical derivative of the forces by minimally displacing the each atom in xyz directions. GROMACS has a tool for this. Probably AMBER too? This process should take couple of seconds on a laptop, not days for sure!

mateuszanotto commented 1 year ago

Amber has two tools to calculate the vibrational frequency. The method I'm using is the MM/PBSA, which is used to calculate binding energy from a set number of snapshots from the MD. It calculates entropy terms using quasi-harmonic approximation or the normal mode approximation. This is where I get my frequencies from. The other method is the same as GROMACS. It diagonalizes the mass-weighted matrix of all atoms to obtain the eigenmodes and perform a quasi-harmonic analysis at a set temperature. I did many tests before, but at that time I never managed to get reasonable frequencies (all were below 20cm-1). I will try it again now that we have the dihedrals with the conversion term.

selimsami commented 1 year ago

Considering the referece (QM Hessian calculation), the second options seems like a more similar comparison indeed. There are no entropic contributions there.

One advice regarding that - make sure you have really good optimized structure. use high convergence criteria and double precision if it's possible. That makes a big difference on the results.

And I would not use any of the "quasi-harmonic analysis at a set temperature", There are no temperature effects on the reference vibrational spectra

mateuszanotto commented 1 year ago

Hello @selimsami,

I investigated further the vib. freq. methods inside amber and they call the same routine internally, which uses the hessian. I compared the frequencies from QForce FF for Amber with AMBER FF and GAFF2 and found two main problems.

  1. The urey-bradley bond terms are not being read. I'm creating a parameter file (.frcmod) and using the topology builder (leap) to create the topology (.prmtop), which is formatted differently. Leap ignores the interactions between 1-3 bonds. Using the AMBER force field with additional Urey-Bradley terms requires creating the topology's UB sections directly. Although, Amber MD can read UB terms from CHARMM force field (.psf), which includes Urey-Bradley.

  2. I set UB terms to False on settings.ini. It shifts the frequencies slightly, but not as much as expected. The MAE goes from 9.55% to 7.65% download And using the dihedrals from gaff2 with a single term, the MAE is 4.72% download (2) So the dihedrals are still a problem.

I'll work on converting the .itp to CHARMM FF with ParmEd to check how it performs on AMBER. If you have any other suggestions, I can also test it.

Once it's validated, I thought that I can implement support to .psf files as it would make QForce available for CHARMM and AMBER at the same time. I didn't want to waste the .frcmod implementation but AMBER seems more stiff with this format than we first thought.

selimsami commented 1 year ago

Hi Mateus, Thanks for the update. Even with the UB turned off, the frequencies seem off? Even the higher frequencies so I think dihedrals are still not (the only) issue. I am really not sure what the issue is. If you're sure the bond/angle terms are correct, I'd look more closely to the settings for the computation of the Hessian:

mateuszanotto commented 1 year ago

I was looking at the dihedrals from Amber FF and they were all much lower than the one I was using. I just found out that I didn't convert it to kcal/mol after implementing the conversion.

These are the frequencies straight from QForce with urey = False. MAE=4.84% download (4)

It seems this was the main source of error. The bonds/angles terms seem a bit underestimated with urey=false. Here I used gaff2 bonds and angles terms to see how they influence the spectra. gaff2bonds MAE = 3.22% gaff2angles MAE = 2.12% both combined, the MAE is 0.48%

selimsami commented 1 year ago

Hi Mateusz, I think you nailed the bonds/angles without Urey, that's great! This is what I get with GROMACS: Screenshot 2023-05-15 at 3 09 39 PM

However, as you can see the dihedrals (3 lowest frequencies) still seem to be off. Any idea what's happening there?

Is shutting off UB gonna be the final solution for AMBER? That does not seem ideal. I remember you were mentioning a PSF file format that can work with this?

MAE = 3.22% This looks a little weird. Looking at the plot you show from Q-Force to gaff bonds, especially at higher frequencies, the frequencies only get worse yet somehow the MAE decreases? any idea what's happening there?

mateuszanotto commented 1 year ago

Oh, it is excellent that the AMBER and GROMACS results are so close without urey. Good to know things are working the way they should. Regarding the 3 lowest frequencies, I didn't find any other FF on amber using Fourier transforms for the HCCC/HCCH dihedrals to compare. I also realized I was removing the 2 lowest freqs to calculate the MAE. I recalculated the MAEs using sklearn.metrics. QForce - MAE=10.07% QF/GAFF2 bonds - MAE=10.16% QF/GAFF2 angles - MAE=8.93%

Still removing the 2 lowest freqs, I get: QForce - MAE=5.17% QF/GAFF2 bonds - MAE=5.27% QF/GAFF2 angles - MAE=3.94%

Indeed, shutting off UB is not the best solution, and the lowest freqs are still off. Amber's manual says it has support for UB terms using PSF files directly, but I've never used it. I'll focus on converting ITP to PSF with ParmEd and seeing how it performs with AMBER.

xiki-tempula commented 6 months ago

Hi, I'm interested in using Qforce for my amber simulations. I wonder what is the state of the PR? I could offer some help as well?

selimsami commented 6 months ago

I think it was getting close but some of the discrepancies were not corrected in the end. The issues I believe at the moment are with the dihedral conversion and the Urey-Bradley term. But I don't use AMBER so I don't have the time/knowledge/interest to try to fix these issues. Feel free to have a go at it though! I think Mateusz did a good job and it was close to being ready.

xiki-tempula commented 6 months ago

Ok, I had a go but it seems that if I tried to generate the parameters in this way

loadamberparams ligand_qforce.frcmod
ligand = loadmol2 ligand_qforce.mol2
saveAmberParm ligand ligand.parm7 ligand.rst7

Then I got message saying that some of the torsions are missing.

Then I checked the Gromacs topology and these torsions are missing as well. I guess in Gromacs as long as any atom is connected via a bond, it is fine. But for amber, it demand all dihedrals need to be present.

xiki-tempula commented 6 months ago

For example, in the ligand, Qforce defines 4 8 16 12 5 8 16 12 But amber also require 4-8-16-32 5-8-16-32 Which is not really necessary as 32 is fixed by the 16 8 12 32 improper. image

xiki-tempula commented 6 months ago

And the parmed seems to be using a different conversion factor now? https://github.com/ParmEd/ParmEd/blob/master/parmed/topologyobjects.py#L2724

xiki-tempula commented 6 months ago

Also it seems that all the improper torsions are missing from the frcmod

mateuszanotto commented 6 months ago

Indeed I was struggling with improper torsions, but I didn't catch the mistake