keiserlab / LUNA

MIT License
48 stars 13 forks source link

Issues with inconsistent valence in multiple frames in a MD trajectory #2

Closed jessica-mckinley closed 4 years ago

jessica-mckinley commented 4 years ago

I'm having some problems running FIFP on certain frames in an MD simulation (issue_check.tar.gz)

For structures 11, 16, 21, 22, and 28 the code throws the following (or similar) error:

[14:37:41] Explicit valence for atom # 45 N, 4, is greater than permitted 2020-02-05 14:37:41,534 - Chunk 0 - root - ERROR - groups.py - _get_mol_from_entity - line 760 => Sanitization error: Explicit valence for atom # 45 N, 4, is greater than permitted Traceback (most recent call last): File "/home/jessica/work/keiser/ifp_v2/scripts/mol/groups.py", line 758, in _get_mol_from_entity SanitizeMol(mol_obj, SanitizeFlags.SANITIZE_ALL) ValueError: Sanitization error: Explicit valence for atom # 45 N, 4, is greater than permitted 2020-02-05 14:37:41,535 - Chunk 0 - root - ERROR - groups.py - _calculate_properties - line 706 => not all arguments converted during string formatting Traceback (most recent call last): File "/home/jessica/work/keiser/ifp_v2/scripts/mol/groups.py", line 758, in _get_mol_from_entity SanitizeMol(mol_obj, SanitizeFlags.SANITIZE_ALL) ValueError: Sanitization error: Explicit valence for atom # 45 N, 4, is greater than permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/jessica/work/keiser/ifp_v2/scripts/mol/groups.py", line 638, in _calculate_properties mol_obj = mol_obj or self._get_mol_from_entity(target_compound.get_parent_by_level('M'), compound_selector) File "/home/jessica/work/keiser/ifp_v2/scripts/mol/groups.py", line 762, in _get_mol_from_entity "RDKit. Check the logs for more information." % mol_file) TypeError: not all arguments converted during string formatting

While these structures do produce a fingerprint, the resulting fingerprint is significantly more sparse than other structures which do not exhibit errors (see structures 20 and 21).

Note: execute script after changing the path to FIFP using python test_auto_fifp.py

luponzo86 commented 4 years ago

I had a similar problem with PLEC, which uses the libraries ODDT and RDkit. I partially solved the problem, so maybe it could be helpful in this case too.

I encountered the same error while reading from a PDB file with:

import oddt
mol = next(oddt.toolkit.readfile('pdb', 'protein.pdb'))

A partial solution is to use RDkit directly:

from rdkit import Chem
mol = Chem.MolFromPDBFile('protein.pdb')

then, in case you'd like to import to ODDT:

from oddt.toolkits.rdk import Molecule
oddt_mol = Molecule(mol)

I still get lots of warnings, such as:

WARNING: not removing hydrogen atom without neighbors

but at least I can run PLEC. I suspect there is some problem with removing hydrogens and "sanitize" the molecule, but I am not familiar enough with RDkit to completely understand.

Hope this helps.

alexandrefassio commented 4 years ago

Dear all,

There is already a solution to this problem. You need to use the option:

   opt["amend_mol"] = True

This flag allows one to define if it is necessary to fix a molecule read from PDB.

Basically, it updates incorrect charges and valences based on OpenEye charge model. Also, I correct incorrect nitro, amidine, and guanidine groups.

Right now, I'm writing a standardizer for residues as at some cases Double bonds are incorrectly set to residues.

bests

jessica-mckinley commented 4 years ago

I have tried turning this on but the issue remains the same. The log file confirms that the new command was recognized (napoli.log)

Do you need to pass any additional information for the algorithm to identify which substructure it needs to fix?

alexandrefassio commented 4 years ago

Let me understand what you're doing...

Do you have a PDB file with a protein and ligand, or only with the protein? Are you using the default properties for residues?

jessica-mckinley commented 4 years ago

The PDB file does contain both the protein and the ligand. I used a script to change the ATOM to HETATM only for the ligand and I confirmed the ligand was correctly identified using the following code: for entry in recover_entries_from_entity(structure, get_small_molecules=True): print("\t", entry)

Could you clarify what you mean by default properties for residues? I don't think I changed these but I'm not quite sure what these are.

Also raw structures as well as the code I used to run LUNA are available in the issue_check.tar.gz folder attached in the first message. Note that the test_auto_fifp.py code does not contain the opt["amend_mol"] = True but I confirmed that adding it does not resolve the problem.

alexandrefassio commented 4 years ago

Hi Jessica,

I just fixed this problem and pushed the commit to the main LUNA project. Update it and try now to see if it works for you.

So, when Open Babel identifies quaternary N, it tends to add an additional and invalid implicit hydrogen. As a consequence, the N becomes V5 and the charge is set to 0.

In the past, I fixed this problem, but I tried to fix only quaternary N with valence=5 and degree=5. In your PDBs, we do have an N with valence 5 and that's why I thought it should have worked if you had set amend_mol to True.

However, in your case, we have an amidine, so the degree is 4 and not 5 as expected. As a consequence, the validator didn't try to fix it. Now, I made the validator to check out all N with valence 5 and charge 0. So, it is working now. Enjoy!!!

bests