Using ligand parameters

slochower commented 3 years ago

I'm using ChimeraX version 1.0 (2020-06-04) and ISOLDE version 1.0.1 on macOS 10.15.4.

When I first load a structure and pop up ISOLDE, the "ISOLDE" menu bar entries are all grey, so I can't select any options there.

For my unknown residue, I've generated an OpenMM-style XML file outside of ISOLDE (antechamber/parmchk2/parmed). After trying to load it, I see an entry in the log file:

Found multiple definitions for atom type: c3

I tried using GAFF v2.11 atom types. Is ISOLDE expecting SYBYL/Tripos or something else?

To be clear, this is how I got my XML file:

antechamber -i XXX.mol2 -fi mol2 -o XXX.gaff2.mol2 -fo mol2 -rn XXX -at gaff2 -an yes -dr no -pf yes -c bcc -nc 0
# `-a yes` to shove all the parameters into the `frcmod`.
parmchk2 -i XXX.gaff2.mol2 -f mol2 -o XXX.gaff2.frcmod -s gaff2 -a yes
# Then use `tleap` to generate a `prmtop` and `inpcrd` pair.

And with ParmEd:

import parmed as pmd

structure = pmd.load_file("XXX.prmtop", "XXX.inpcrd")

params = pmd.amber.AmberParameterSet.from_structure(structure)
params = pmd.openmm.OpenMMParameterSet.from_parameterset(params)

outfile = open("XXX.xml", "w")
params.write(outfile)
outfile.close()

(Since you're loading an XML, I think we could get this to work with OpenFF offxml files too, since those run seamlessly with OpenMM!)

How does residue matching work? The residue name and atom types that I have in my XML file are not perfect matches for what's in the structure. I figured this would fail, but wanted to see what would happen anyway. I'm assuming ISOLDE expects the residue name and atom names to match between the parameters and the PDB. Is that correct?

tristanic commented 3 years ago

Greyed-out menu entries: I'm mystified. Will look into it.

Duplicate atom definitions: try omitting the "-a yes" from your parmchk2 call (ISOLDE already has the GAFF2 forcefield loaded, so your .frcmod should only provide the parameters that aren't standard). The code I use to convert to XML is essentially identical to yours: https://github.com/tristanic/isolde/blob/a9f976c34affbe917de9e1f6a6062ea423e49054/isolde/src/openmm/amberff/amber_convert.py#L110.

Supporting OpenFF: absolutely I'm all for it! Resources permitting, of course... I'm a big fan of the work you've been doing there.

Residue matching is a bit of a hybrid system. For standard residues it uses OpenMM's topology-matching scheme (apart from cysteines and glycosylated amino acids, which are assigned templates explicitly based on bonding). So for those residue names don't matter. For ligands it first tries to assign an explicit template according to the residue name - but if that fails, it will fall back to trying to match by topology. For the most part atom names don't matter - exceptions are the aforementioned cysteines and glycosylated amino acids, and the sugars themselves.

tristanic commented 3 years ago

The code I use to assign templates to residues is here, if you want more details: https://github.com/tristanic/isolde/blob/a9f976c34affbe917de9e1f6a6062ea423e49054/isolde/src/openmm/openmm_interface.py#L3055

Templates loaded with the "Load residue MD template(s)" button get "USER_" prepended to their name, so they take precedence over any built-in templates with the same name. The general strategy is to try to apply explicit templates to as many of the non-polymeric residues as possible, and let the auto-matching handle the rest.

tristanic commented 3 years ago

A couple of points of clarification while I think of them:

if your residue name doesn't match your template, and it happens to match the name of one of the ~15k built-in standard ligands, you'll probably get an error - so residue names do matter in that respect.
While the OpenMM engine is flexible with respect to residue and atom names, ISOLDE's primary purpose is for building and refining models for deposition into the Protein Data Bank. There naming really does matter, so it pays to keep things consistent.

tristanic commented 3 years ago

Feedback from the ChimeraX team: the greyed-out menu seems to be a bug that comes up when you run ISOLDE for the first time in the same ChimeraX session that you used to install it. After restarting ChimeraX, it should work correctly.

slochower commented 3 years ago

Feedback from the ChimeraX team: the greyed-out menu seems to be a bug that comes up when you run ISOLDE for the first time in the same ChimeraX session that you used to install it. After restarting ChimeraX, it should work correctly.

I'll need to process your earlier comments but restarting ChimeraX did seem to fix the issue with the menu!

slochower commented 3 years ago

Duplicate atom definitions: try omitting the "-a yes" from your parmchk2 call (ISOLDE already has the GAFF2 forcefield loaded, so your .frcmod should only provide the parameters that aren't standard). The code I use to convert to XML is essentially identical to yours:

Okay, great. I was able to generate a paired down frcmod and load in my residue-specific parameters. To make sure the atom names match, I extracted the ligand from the PDB and then went through the rest of the workflow (rather than starting from scratch with SMILES or whatever). Then I discovered that the "Load residue MD definition(s)" will look in the current directory, so I moved my files there, and I think that part is working. Does this get copied somewhere else? I'm asking because on subsequent launches of ISOLDE, the log file suggests it is loading this residue's definition from the internal database (i.e., I don't have do the "Load residue MD definition(s)" button again) and I want to make sure it's actually using the parameters that I assigned and not matching some completely different thing that happens to have the same three letter/number code. I'm guessing this where this bit of code https://github.com/tristanic/isolde/blob/a9f976c34affbe917de9e1f6a6062ea423e49054/isolde/src/openmm/openmm_interface.py#L3080 starts looking for the "USER_" prefix, but I'm not sure. I'm also curious because I half-considered doing refinements for the same ligand with different FFs to see if they go anywhere different.

However, there are still a couple quirks. Every single residue shows up in the "Unparameterised residues" list which seems... not correct (I was pointed here by trying to start a simulation and being told that I'm still missing parameters). All of the normal protein amino acids show up there, starting from the first amino acid in the structure and going through the whole sequence, all the way to the waters with residue name HOH. If you click on one, sure, I see that many matches are with a score of -1.000 (which I'm guessing is perfect), but does this mean I have to manually match each of these? I feel as though something that automatically should happen is failing here. Am I missing something obvious?

Edit: I'm guessing a lot of this is missing hydrogens, so let me try adding hydrogens first (d'oh) and then I'll report back.

Edit 2: addh fixed a lot of the unparameterized residues! Still a couple minor ones I'm working through, like things that don't match any templates possibly because they were given the incorrect residue name. I'm trying to cross-reference with the library in CCP4.

tristanic commented 3 years ago

Then I discovered that the "Load residue MD definition(s)" will look in the current directory, so I moved my files there, and I think that part is working. Does this get copied somewhere else? I'm asking because on subsequent launches of ISOLDE, the log file suggests it is loading this residue's definition from the internal database (i.e., I don't have do the "Load residue MD definition(s)" button again) and I want to make sure it's actually using the parameters that I assigned and not matching some completely different thing that happens to have the same three letter/number code.

No - custom ligands have to be loaded fresh each time. The XML files don't have to be in the current directory, but it's true that's where the file browser will open when you click "Load residue MD definition(s)" so that's most convenient. I've thought about collecting previously-used templates into some form of cache, but decided that's more trouble than it's worth. Partly due to the issue you highlight (what if the user wants to try different templates with the same name?) and partly due to possible confidentiality issues on shared machines. The "loading from internal database" message says there's a template with the same name in ISOLDE's store (a .zip file of XML templates for ~15k ligands from the Chemical Components Dictionary).

Glad you picked up on the need to addh. If there are no hydrogens in your model when you try to start it should pop up a window asking if you've forgotten - I guess you had a few somewhere that caused it to bypass this?

slochower commented 3 years ago

No - custom ligands have to be loaded fresh each time. The XML files don't have to be in the current directory, but it's true that's where the file browser will open when you click "Load residue MD definition(s)" so that's most convenient.

Oh, right.

Glad you picked up on the need to addh. If there are no hydrogens in your model when you try to start it should pop up a window asking if you've forgotten - I guess you had a few somewhere that caused it to bypass this?

I guess so. The ligand had hydrogens but nothing in the protein, as far as I know. The trouble that I'm running into now is that some non-protein-residues have names that aren't being matched properly in the library or aren't matched at all. I grabbed a copy of CCP4 and looked through the giant cif file to see what they should match, but I think I've discovered there's no simple way to change residue name without modifying the structure in ChimeraX. I can see the dialog in the "Build Structure" editor but it doesn't appear to have any effect unless I make some changes. I seem to recall there was a way to do this in Chimera (non-X) and I don't believe swapaa will work for these non-standard AAs. It's easy enough to just edit the starting PDB file, but maybe it would also be nifty to have ISOLDE search the ligand dictionary by string? In other words, the residue name might be unclear here because the density wasn't clear, but I want to match these atoms to a POPC lipid so let me pick that to parameterize these atoms?

tristanic commented 3 years ago

To change a residue's name to XXX without modifying any atoms, select it (ctrl-click any atom from it) then use the command:

setattr sel residues name XXX

If you do need to change atoms, the new residue is from the CCD, and there is a substantial topological overlap between old and new, then you can use:

isolde replace ligand sel XXX

This will use graph-matching to trim the residue back to the common core between old and new and rename all the atoms, then build in any missing atoms from the new residue based on the CCD template. This is a work in progress and (amongst other things) not chirality-aware, so use with care.

tristanic commented 3 years ago

By the way, to search all the residues available in components.cif I find http://ligand-expo.rcsb.org/ most useful.

slochower commented 3 years ago

Thanks for your help! I did run into a single index error when trying to build a residue from template, but I was unable to reproduce this. In the meantime, I'm going to close this Issue because I think we addressed all my original problems. If I run into any other quirks, I'll swing by again.

slochower commented 3 years ago

Ah, I don't think I addressed the comment about OpenFF. If you'd like to add support for the OpenFF force fields, I see two possible routes for ISOLDE. The first method is to use the OpenFF toolkit to directly generate the OpenMM system for you, like so: https://github.com/openforcefield/openforcefield/tree/master/examples/using_smirnoff_with_amber_protein_forcefield (following the BRD4 tutorial cells 2 through 6 and update the FF from openff_unconstrained_1.0.0.offxml to version 1.2.0 or whatever).

The second method is a little hacky, but it is more similar to the way you generate ligand parameters currently. Because there is no concept of "atom type" in OpenFF (everything is matched via SMARTS), you can create fake atom types for every single atom and generate a frcmod containing OpenFF parameters for just the ligand. This wouldn't scale for an entire biomolecular simulation because the parameter file would be enormous, but for modestly sized ligands this works surprisingly well. I have a little example notebook that demonstrates the method: https://github.com/openforcefield/openforcefield/issues/706

I can either help directly with this or coordinate with the OpenFF folks (I've moved to industry).

tristanic commented 3 years ago

Thanks! Once I've cleared some more pressing issues from my plate, I'll look into this in earnest.

tristanic commented 3 years ago

Just to give a little more clarity on that: right now I urgently need to tie up a handful of new manuscripts so I can clear my plate to focus on a major fellowship application. So my serious dev time is going to be virtually nonexistent for at least the next month or two (basically some bug fixes, and getting out new release builds for ChimeraX 1.1 and/or 1.2. Fellowship is due in late November - once that's out of the way I'll be back to focusing on real development again.

slochower commented 3 years ago

Good luck 🖋

slochower commented 3 years ago

I'm going to unearth this issue from the sediment because I'm running into a couple of the same issues encountered above.

I'm seeing the greyed out ISOLDE menu entries again. This is not the same session where I installed ISOLDE. All I've done is launched ChimeraX, typed the command open 6eyd; open 3983 from emdb, and then launched ISOLDE from the Tools menu bar. I think I can mostly use ISOLDE from the command line (e.g., isolde sim start) but I can't access that menu. This is ISOLDE 1.1.0 (I know I saw more detailed versioning info when it installed the Python wheel, but I can't find that now) and ChimeraX version 1.1 (2020-10-07) on macOS 10.15.7.
On a colleague's computer (where the menu bar entries weren't greyed out) we tried to select the option to load the ligand definition from AMBER files (I can't check the exact wording), but we didn't observe any additional dialog boxes. Can you remind me whether this option is supposed to pop up a file browse dialog or something else?
We ran into a case where a known ligand (i.e., one that is in Ligand Expo with the correct name) was having an issue with protonation. That is, the residue in the structure already had the appropriate protonation, but ISOLDE complained it was unable to add a hydrogen (to a nitrogen because then there would be too many bonds). I'm curious what's the recommended way forward here: is it to remove all hydrogens and let ISOLDE add them from the template? Is there a way to tell ISOLDE to match the ligand but not try to add hydrogens?

tristanic commented 3 years ago

Issue 1: could you report this via the ChimeraX Help/Report a Bug menu entry, on the machine that has the problem (and preferably when you have a greyed-out menu)? I can't see how it can be an ISOLDE-specific problem and I can't reproduce it on my Mac - reporting it that way will provide extra information about your system and the actions leading to the problem.

Issue 2: the entry in the ISOLDE menu isn't used for loading AMBER parameter files - just converting them to OpenMM ffXML format. To load them, use the "Load residue MD definition(s)" button on the "Sim settings" tab of the ISOLDE panel. I'm afraid you won't get any feedback on that at this stage - to do so will require some changes to OpenMM's forcefield.py to provide a verbose option to its forcefield parser. I've been meaning to do that and make a pull request for a while.

Issue 3: any chance you could share a file with the ligand coordinates (just the ligand itself would be fine) and the MD template (if you're using a custom one)? If it's not a custom template, I suspect I know what the problem is: the template library I'm using for most of the ligands (from https://github.com/nwmoriarty/amber_library - a collaboration between Nigel Moriarty and Dave Case) had a bug in one of their early iterations where almost all nitrogens were treated as amines regardless of geometry. I thought those had all been fixed, but it's possible a few have still slipped through the cracks. If that's the problem, then you're best off re-parameterising the ligand - any custom ligand templates you load at run-time will override built-in ones - and I'll see about getting the built in one replaced. Also need to start seriously looking in to properly supporting Open Force Field for ligands.

slochower commented 3 years ago

Issue 1. Will do.

Issue 2. Okay great. Let me play around a little more. (I'm back already!) I loaded in an appropriate XML (that I created manually, but basically following your same code) and I can see a perfect match in the "Unparameterised Residues" dialog. It says USER_XXX (1.000). However, I get a little dialog telling me that "there are 0 connected atoms in common with the template. At least 3 matching atoms are needed." The atom names in the XML match the atom names in the loaded PDB perfectly. Is it perhaps looking for some other concordance?

Issue 3. Ah, interesting. Let me play with this a bit more. FWIW, I do not see the 3-letter code (that I got from Ligand Expo) in that repository. When trying to match parameters, I see it prefixed by "MC".

tristanic commented 3 years ago

Issue 2: That's a bit odd. A bit of background information: if OpenMM itself doesn't make a perfect match (or if it makes more than one match) the fallback is to use a maximum-common-subgraph approach to find partial matches. This ignores atom names - it only knows about elements and which atom is bonded to which. The only situation I can think of that could lead to the above message is where either the residue in your model or the template is missing all bonds... are you able to share anything to help diagnose further?

Issue 3: Yes, the "MC" stands for Moriarty & Case - I append prefixes to different groups of templates to help keep track of their source. If it's not in their repository any more, the most likely explanation is that they noticed the problem with it and removed it, but haven't run a replacement parameterisation.

slochower commented 3 years ago

Issue 2. My mistake! Atoms matched perfectly but the residue name didn't. Fixed that now and I think everything is working as expected. Do these USER_XXX parameters get cached anywhere or will we need to load them each time?

Issue 3. Also my mistake (partially). The residue is in the repository; I didn't realize that GitHub was truncating the folder view. I'm not positive, but I think there is an issue with the nitrogen valence/parameters. I have not investigated this deeply, but when I tried to parameterize the residue manually, I had to run antechamber with -j 5 to prevent antechamber from moving the double bond. This might require some further investigation on my part.

tristanic / isolde

Using ligand parameters #2