luancarvalhomartins / PyAutoFEP

PyAutoFEP: an automated FEP workflow for GROMACS integrating enhanced sampling methods
158 stars 74 forks source link

Solvation issues and enhancements #23

Open insukjoung opened 2 years ago

insukjoung commented 2 years ago

Hi, First of all, thank you for sharing your precious source code.

I am having trouble in topology generation regarding solvation.

https://github.com/luancarvalhomartins/PyAutoFEP/blob/fcc52e914fac9edebc1675dd032421915a2986f4/prepare_dual_topology.py#L1648 In the line, None is passed to guess_water_box, but passing solvate_data['water_model'] might be better. In case guess_water_box fails to guess the type of water, the former cannot help but the latter can resolve the issue as far as the user provides water model.

https://github.com/luancarvalhomartins/PyAutoFEP/blob/fcc52e914fac9edebc1675dd032421915a2986f4/prepare_dual_topology.py#L3364-L3366 Here, ion_conentration: 0 should be ion_concentration: arguments.buildsys_ionconcentration to be consistent with complex solvation.

Please let me know if the issues can be resolved without changing the source code.


Checklist for closing this (Added by @luancarvalhomartins so GitHub can track completion, hope you don't mind me editing your issue):

insukjoung commented 2 years ago

I found other related issues. For pre-solvated systems, gmx genion must be run to neutralize the complex system. Currently, this step is skipped if the system is pre-solvated. When the ligand is charged, it has an effect. The required argument the run must have is -conc 0 -neutral.

insukjoung commented 2 years ago

Here is another related problem.

The topology file generated in the protein leg. The order of LIG is misplaced like:

[ molecules ]
; Compound        #mols
LIG                 1
system1             1
system2             1
SOL         11092
K                   40
CL                  30

The correct order should be like:

[ molecules ]
; Compound        #mols
system1             1
system2             1
LIG                 1
SOL         11092
K                   40
CL                  30

The order is correct in the pdb file though. I put presolvated_protein_chains = 2 in the input configuration file, but I am not sure if the parameter is actually utilized.

insukjoung commented 2 years ago

Here is another related problem.

The topology file generated in the protein leg. The order of LIG is misplaced like:

[ molecules ]
; Compound        #mols
LIG                 1
system1             1
system2             1
SOL         11092
K                   40
CL                  30

The correct order should be like:

[ molecules ]
; Compound        #mols
system1             1
system2             1
LIG                 1
SOL         11092
K                   40
CL                  30

The order is correct in the pdb file though. I put presolvated_protein_chains = 2 in the input configuration file, but I am not sure if the parameter is actually utilized.

https://github.com/luancarvalhomartins/PyAutoFEP/blob/fcc52e914fac9edebc1675dd032421915a2986f4/prepare_dual_topology.py#L1900 The correct molecule name should starts with 'Protein' according to the line. Then, please ignore my previous comment.

luancarvalhomartins commented 2 years ago

Thank you for your detailed report. Sorry for the tardiness, I was taking some days off for the holidays.

Here, ion_conentration: 0 should be ion_concentration: arguments.buildsys_ionconcentration to be consistent with complex solvation.

The use of ion_conentration: 0 was added on purpose, as the water system is often very small and achieving meaningful ionic strengths may be problematic. But I agree it would be useful to allow for selecting ions in the water leg, as long as this is somehow independent from the complex leg ionic strength.

In the line, None is passed to guess_water_box, but passing solvate_data['water_model'] might be better. In case guess_water_box fails to guess the type of water, the former cannot help but the latter can resolve the issue as far as the user provides water model.

Good catch, I am changing that in the next commit.

For pre-solvated systems, gmx genion must be run to neutralize the complex system. Currently, this step is skipped if the system is pre-solvated. When the ligand is charged, it has an effect. The required argument the run must have is -conc 0 -neutral.

Yep, I noted that in: https://github.com/luancarvalhomartins/PyAutoFEP/blob/fcc52e914fac9edebc1675dd032421915a2986f4/prepare_dual_topology.py#L3319 It is not something I would expect to be very common (ie, equilibrating a system with ligand bearing a different charge to the ligand series). But I fixing that.

The correct molecule name should starts with 'Protein' according to the line. Then, please ignore my previous comment.

Yes, that's a current limitation of the prepare_dual_topology.py. I hope I can remove that eventually, but I am not sure how exactly. May be the LIG line could be added before the SOL line (which would require rewriting of some code for pre_solvated systems and some testing for systems with other small molecules/DNA/cofactors). In case the protein name is not Protein, does make_ndx create a Protein group? If so, maybe using the same logic as make_ndx may solve problem. How does is sound? What do you think?

insukjoung commented 2 years ago

Yes, that's a current limitation of the prepare_dual_topology.py. I hope I can remove that eventually, but I am not sure how exactly. May be the LIG line could be added before the SOL line (which would require rewriting of some code for pre_solvated systems and some testing for systems with other small molecules/DNA/cofactors). In case the protein name is not Protein, does make_ndx create a Protein group? If so, maybe using the same logic as make_ndx may solve problem. How does is sound? What do you think?

I am new to gromacs so I don't know if the molecule type can affect the behaviour of make_ndx. I think using pre-defined name is fine as far as it is documented. Maybe you can print out error messages instead of warnings if the name of molecule type is unknown. (I think is is indeed an error :grin:)

Quick test on make_ndx: Make_ndx uses gro file. I find [ Protein ] in the index file if I run gmx make_index -f FullSystem.gro -o index.ndx

luancarvalhomartins commented 2 years ago

Checklist for closing this: