selimsami / qforce

Apache License 2.0
57 stars 13 forks source link

Help with running qforce - 3 issues #59

Open simonlichtinger opened 2 years ago

simonlichtinger commented 2 years ago

Dear qforce developer(s), bug_report_qforce.zip

I'm attaching a pdb file of the ala-ala dipeptide which I would like to parametrise in qforce (because gaff2 assigns inappropriate dihedrals), [ala-ala.pdb]. Within the workflow, I encountered 3 issues.

Software versions: qforce latest pip install, 0.6.6 orca latest version, 5.0.3

qforce command: qforce ala-ala.pdb -o settings.txt

Rearrangements in bonding patterns on QM geometry optimisation

Using the default QM parameters in the orca engine [settings.txt], two hydrogen atoms migrate during the QM optimisation [distorted.xyz]. I have also tried starting from a different conformation of my dipeptide by rotating some dihedrals, in case I was just at a weird point in conformational space. However, no matter the initial ala-ala conformation I tried, weird reactions happened, including cyclisations [ala-ala-rot.pdb -> cyclised.xyz].

When I changed the QM method to HF 6-31G* for all orca steps involving the 'opt' keyword, this issue disappeared (all subsequent parts of this report are based on these calculations). While that may not be an issue with qforce itself, I thought I would ask you if that's something you've seen before? What could be the cause?

Fragmentation procedure doesn't recognise charged groups

The ala-ala dipeptide is overall neutral, but has two charged groups at the N- and C-terminus. Some dihedral fragments however carry an overall charge because they only contain one of the charged groups. Qforce doesn't seem to recognise this, in the generated orca input files the charge is listed as zero for these fragments, and orca won't run because an odd number of electrons isn't compatible with a multiplicity of 1. I fixed this manually by going into the inp files and changing the charge appropriately, but it would be a great feature for qforce to automatically recognise this, otherwise it doesn't seem to be able to deal with molecules carrying formally charged groups.

The dihedral scan fitting is way off

With the two manual 'fixes' in place, I was able to proceed to dihedral scanning. However, about half the dihedrals have horrendous fits. I'm providing one example here, arguably the worst one [CC_H11C4N2O1_4f1477e015ca74742dd44bd9383dc434~1_opt.xyz], gives the following fit:

image

I have no idea what the issue may be, and I would really appreciate your help!

Many thanks Simon

All referenced files needed to reproduce my runs are in the attached zip file.

selimsami commented 2 years ago

Hi Simon,

Thanks for the detailed bug report.

1) Rearrangements in bonding patterns on QM geometry optimisation

This indeed seems like a QM problem. @xiki-tempula implemented ORCA in Q-Force with the r2SCAN-3c as the default optimizer. I have no experience with this method so I cannot tell why it would behave like that, maybe he has more insight. The usual Q-Force default that most tests have been done is PBE D3-BJ with 6-31+G(D) basis set. You might want to try it with that and in general it might be a good idea for us to have a uniform default QM method across all QM software...

2) Fragmentation procedure doesn't recognise charged groups

Fragmentation procedure should recognize charged fragments. However, there was a bug specifically in the ORCA interface that I have now fixed. Thanks for catching that! Please update to 0.6.7 and let me know if it works.

3) The dihedral scan fitting is way off

If you have it available, can you try to use GROMACS for the dihedral scanning? I have experienced that the GROMACS optimizer is more stable in certain cases than the ASE optimizer that Q-Force uses internally. What you need is adding the following lines to your settings file:

[scan] method = gromacs

If this doesn't work, I would like to check the issue more carefully. In that case could you provide all the files in the directory ala-ala_qforce (most importantly, ORCA outputs) so I can easily recreate your issue?

I hope these answers help and good luck with your project!

Cheers, Selim

xiki-tempula commented 2 years ago

@simonlichtinger Interesting, I choose r2-scan-3c as the default optimisation method as it is cheap and seems to be rather accurate. But given that this is GGA, it is definitely less accurate than hybrid, but I wouldn't have thought that it is that bad. Can I have the ORCA input file which cyclised the dipeptide so I could have a look. For this particular file, do you mind try B97-3c to see if the problem persists? @selimsami Thanks for catching my mistake.

simonlichtinger commented 2 years ago

@selimsami Thanks for your quick reply and fixing the fragmentation. I've now added the gromacs scan setting and run the fitting (folder is af_qforce, which I am attaching). It works better for some dihedrals, but still doesn't work for others. For example, scan_data_CC_H8C5N1O3_acadd09f79966f92c5e3f9478d7dca49~1:

image

@xiki-tempula The orca input is also attached. Please note that this is not only a problem with r2-scan-3c. If I disable the initial optimisation completely, so just optimise with B3LYP, similar 'reactions' happen. Only when I went to an ab intio method was the molecule stable. (Btw, the cyclisation happened only in one trial from a particular starting conformation, most commonly, the reaction is the transfer of a proton)

The zip file is not uploading. Will try to create a new post containing the zip file only

simonlichtinger commented 2 years ago

Sorry, there seems to be a bug in github when trying to upload the zip file with the additional information. I just get "is not included in the list" - whatever that means. Please find it here for now: https://drive.google.com/file/d/1sRmw8M1G0yQv3RgukX6MmFlwYFn4ITOJ/view?usp=sharing

xiki-tempula commented 2 years ago

@simonlichtinger I think your case is a bit tricky. In this case, it is just that HF is bad and the ab initio MP2 will give the same problem. The thing is that we do the geometry optimisation in vacuum and there is no such thing as a zwitterion dipeptide in vacuum so the QM software are doing weird things. I'm not sure as to what is the best way forward @selimsami what is your thought?

selimsami commented 2 years ago

Thanks for the files, i will check them on Monday.

Regarding the zwitterion, if the molecule is not stable in gas phase, one could add implicit solvent to all QM calculations. That should help?

xiki-tempula commented 2 years ago

@selimsami It would be stable but then the Hessian that we got are the hessian in solvent and we are trying to fit the MM vacuum hessian.

xiki-tempula commented 2 years ago

So the key is that zwitterion is not stable in vacuum r2-scan-3c in vacuum > lose zwitterion MP2/def2-SVP in vacuum > lose zwitterion r2-scan-3c in water > preserve zwitterion Archive.zip

selimsami commented 2 years ago

@simonlichtinger, thanks for the files - I had a look and noticed that in several of your dihedral scans, there is a hydrogen transfer happening. This is probably the reason for the messed up dihedral profiles as you cannot fit this to an MD potential

It would be stable but then the Hessian that we got are the hessian in solvent and we are trying to fit the MM vacuum hessian.

@xiki-tempula, you are right, but I think it would still be an acceptable approach. The MD potential will be much less affected by the presence of water than the QM one. It is also a commonly employed strategy (just to name a few: QMDFF, QUBEKIT) in FF parametrization.

xiki-tempula commented 2 years ago

@selimsami Cool, If you could set up the keyword in solvent and do the Gaussian part, I could add the ORCA part.

simonlichtinger commented 2 years ago

Thanks both! So the ORCA keyword is CPCM(WATER) then? I don't know much about QM calculations but if you confirm that's the right flag then I can try to see if it solves things.

@selimsami The hydrogen transfer is indeed what had been happening in my geometry optimisations unless I went to HF level of theory. But would this really be the reason why some of the fits are so bad? CC_H8C5N1O3_acadd09f79966f92c5e3f9478d7dca49~1, for example, is a simple methyl torsion as far as I can tell and the QM profile looks just that way with 3 evenly spaces minima, so there should be a good fit to it? Or am I misunderstanding the way these fits work? Sorry for the very unknowledgable comments here!

xiki-tempula commented 2 years ago

@simonlichtinger In the easiest sense yes. I will need to check the current best practice though as I don't know if the latest gaussian charge scheme supports analytical hessian.

selimsami commented 2 years ago

Thanks both! So the ORCA keyword is CPCM(WATER) then? I don't know much about QM calculations but if you confirm that's the right flag then I can try to see if it solves things.

Probably @xiki-tempula knows the ORCA keyword better. Just make sure to use it at every single QM calculations (opt, hessian, charges, scans, etc). I will in the future add a way to do this automatically.

The hydrogen transfer is indeed what had been happening in my geometry optimisations unless I went to HF level of theory. But would this really be the reason why some of the fits are so bad? CC_H8C5N1O3_acadd09f79966f92c5e3f9478d7dca49~1, for example, is a simple methyl torsion as far as I can tell and the QM profile looks just that way with 3 evenly spaces minima, so there should be a good fit to it? Or am I misunderstanding the way these fits work? Sorry for the very unknowledgable comments here!

All the scans are somewhat connected since you will find that torsion parameter also in other fragments. If one scan fails terribly and very bad parameters are assigned to that torsion, then it can also affect other simpler scans (methyl scan in your case). I would fix the hydrogen transfer issue and see if the problem still persists.

simonlichtinger commented 2 years ago

I have now run the parametrization with qm solvent by adding the CPCM(WATER) keyword to all QM commands. Unfortunately, this does not appear to have fixed the problems of the poor fits (however, at least the hydrogen transfers don't seem to happen anymore as far as I can see) Just two examples:

image image

Unforunately, the github bug still prevents me from attaching a file, but here's all the data:

https://drive.google.com/file/d/19A7MiLsZYXqlPMy-OE-cWJbiChG3JL1F/view?usp=sharing