Closed pbuslaev closed 10 months ago
While the patch does solve the issue with the Topology, the next problem occurs when I am trying to create an interchange:
ff14sb = ForceField("ff14sb_off_impropers_0.0.3.offxml")
protein_intrcg = Interchange.from_smirnoff(force_field=ff14sb, topology=protein)
This is due to the fact the negatively charged cysteine is not a part of force field. It seems that it can be easily ported from amber, since AMBER14SB supports CYM
, which is negatively charged cysteine. Thus, I think that by adding CYM
to residue list in generate and convert scripts of amber-ff-porting should solve the issue. Unfortunately, I can not do it myself, since I do not have license for openeye. I wonder, if it is worth trying to rework convert tools with say rdkit. If I read the code correctly, the main dependency in amber-ff-porting
on openeye, is with SMARTS generation, which should be possible to do with rdkit. In general, I think it can be useful to make porting independent of openeye, since one might need to port new (e.g. modified) residues to the amber forcefield.
Hi @pbuslaev, thanks for the really great report and PR. It turns out that the substructure loading library DOES have CYM, but for some reason it's not accepting this reasonable-looking input.
For the PDB-loading-substructure stuff - we try to maintain a processing pipeline from the RCSB chemical component dictionary to our substructures. So instead of modifying our big substructure data file directly, I'll point out on that PR where we can modify the processing script to add the patch.
For the AMBER ff14sb port changes, I can't recall why we didn't port CYM initially. I think we just assumed it was so rare that it wasn't worth covering. Getting that pipeline running again will take some time that I don't currently have. I agree it's unfortunate that it's an OpenEye-dependent workflow - IIRC it was because OE had some functionality that we couldn't easily find in RDKit. I'll open an issue on that repo to track the request but I unfortunately don't anticipate having time to action it any time soon. I'll ask around internally to see if anyone can do it though!
Describe the bug I tried to to generate topology from pdb with deprotonated, but not covalently connected to other cysteine, system. Such cysteines are often observed in protein systems, especially if they are close to ions. Toolkit gave me an error, saying that some bonds or atoms in the input could not be identified.
To Reproduce I encountered the with
openff-toolkit=0.14.3
(clean installation into empty conda environment).Output This is the error message I got:
Computing environment (please complete the following information):
conda list
Additional context As far as I understand, it is the consequence of the fact, that negatively charged cysteines which are not terminal, but located in the main chain are not present in
data/proteins/aa_residues_substructures_explicit_bond_orders_with_caps_explicit_connectivity.json
file.Files to reproduce cyx_test.pdb