Closed JenkeScheen closed 3 years ago
Just do this:
BSS.IO.saveMolecules("tmp_images/m0.pdb", merged._toRegularMolecule(), "PDB")
Or if you want the lambda=1 state:
BSS.IO.saveMolecules("tmp_images/m1.pdb", merged._toRegularMolecule(is_lambda1=True), "PDB")
In your example it is likely failing because you would need to provide a map for all properties that are required by the PDB parser, not just coordinates
and element
. That said, I'm not sure why Mol2 would work. I'll test locally to see if I can see why. Also, if you set
BSS.setVerbose(True)
you'll see the full error trace from Sire, which should tell you more info as to why the parser failed.
(For reference, I didn't write prepareFEP.py
, and wouldn't have chosen to save the lambda=0 state like that.)
It looks like the property map isn't working correctly for PDB for some reason. Not sure why this issue wouldn't have been triggered before. However, using ._toRegularMolecule
is the correct way to do things.
Okay, fixed it. The issue was that I had updated the way we write PDB files within BioSimSpace to inject CONECT records. To do this, I use Sire.Mol.Connectivity
. However, I hadn't passed through the property map from saveMolecules
, so it couldn't extract the molecular coordinates in order to work out the connectivity matrix. I'll just check that my fix doesn't break anything, then push.
Great ! Definitely happy to improve prepareFEP.py, we have to be careful not to break the script accidentally as it is used in Cresset's pipelines.
great, thanks for the quick fix!
Annoyingly this will need to be updated at the Sire level. Despite the Connectivity
object taking a PropertyMap
as an argument, it doesn't actually use it, so the same error is triggered. Should be a simple fix, but will need a rebuild of Sire too.
Okay, I've fixed Sire. Bizarrely, despite taking a PropertyMap
argument, the BondHunter
classes were using another property object to get the names for the coordinates
and element
properties, which were hard-corded to these values, giving no user flexibility.
Once Sire is built, I'll push the BioSimSpace update. Following that, the existing prepareFEP.py
script should work again. (I still note that we should eventually change it to use the ._toRegualrMolecule()
functionality instead.)
just reopening this for now as I'm seeing some odd behaviours when trying to visualise the merged molecule with RDKit. It seems that with the merged.toMolecule()
command there are some formatting issues (at least, RDKit doesn't like the formatting..).
With the above setup of merged
and BSS.IO.saveMolecules("tmp_images/m0.pdb", merged._toRegularMolecule(), "PDB")
, tmp_images/m0.pdb
looks like:
MODEL 1
ATOM 1 C LIG 1 -1.999 21.529 -27.316 1.00 0.00 C
ATOM 2 C LIG 1 -1.316 22.398 -28.167 1.00 0.00 C
ATOM 3 C LIG 1 -1.969 22.948 -29.263 1.00 0.00 C
ATOM 4 C LIG 1 -3.305 22.619 -29.537 1.00 0.00 C
ATOM 5 C LIG 1 -3.988 21.750 -28.667 1.00 0.00 C
ATOM 6 C LIG 1 -3.325 21.210 -27.571 1.00 0.00 C
ATOM 7 CL LIG 1 -5.653 21.341 -28.967 1.00 0.00 CL
ATOM 8 C LIG 1 -4.026 23.160 -30.714 1.00 0.00 C
ATOM 9 O LIG 1 -4.200 22.439 -31.684 1.00 0.00 O
ATOM 10 N LIG 1 -4.499 24.418 -30.650 1.00 0.00 N
ATOM 11 C LIG 1 -5.265 25.127 -31.585 1.00 0.00 C
ATOM 12 C LIG 1 -5.916 24.576 -32.701 1.00 0.00 C
ATOM 13 C LIG 1 -6.724 25.393 -33.487 1.00 0.00 C
ATOM 14 N LIG 1 -6.870 26.680 -33.197 1.00 0.00 N
ATOM 15 C LIG 1 -6.268 27.251 -32.161 1.00 0.00 C
ATOM 16 C LIG 1 -5.490 26.474 -31.306 1.00 0.00 C
ATOM 17 N LIG 1 -6.514 28.598 -31.872 1.00 0.00 N
ATOM 18 C LIG 1 -5.695 29.463 -31.201 1.00 0.00 C
ATOM 19 O LIG 1 -4.612 29.130 -30.766 1.00 0.00 O
ATOM 20 C LIG 1 -6.220 30.860 -30.996 1.00 0.00 C
ATOM 21 CL LIG 1 -1.104 24.059 -30.278 1.00 0.00 CL
ATOM 22 H LIG 1 -7.298 30.950 -31.142 1.00 0.00 H
ATOM 23 H LIG 1 -1.493 21.102 -26.462 1.00 0.00 H
ATOM 24 H LIG 1 -0.290 22.648 -27.968 1.00 0.00 H
ATOM 25 H LIG 1 -3.855 20.533 -26.916 1.00 0.00 H
ATOM 26 H LIG 1 -4.323 24.981 -29.831 1.00 0.00 H
ATOM 27 H LIG 1 -5.791 23.529 -32.939 1.00 0.00 H
ATOM 28 H LIG 1 -7.240 24.981 -34.339 1.00 0.00 H
ATOM 29 H LIG 1 -5.055 26.916 -30.415 1.00 0.00 H
ATOM 30 H LIG 1 -7.369 29.030 -32.182 1.00 0.00 H
ATOM 31 H LIG 1 -5.700 31.537 -31.671 1.00 0.00 H
ATOM 32 H LIG 1 -5.966 31.191 -29.998 1.00 0.00 H
ATOM 33 H LIG 1 -5.460 32.770 -30.540 1.00 0.00 XX
ATOM 34 H LIG 1 -4.610 31.410 -29.770 1.00 0.00 XX
ATOM 35 H LIG 1 -4.360 31.750 -31.490 1.00 0.00 XX
CONECT 1 2 6 23
CONECT 2 3 24
CONECT 3 4 21
CONECT 4 5 8
CONECT 5 6 7
CONECT 6 25
CONECT 8 9 10
CONECT 10 11 26
CONECT 11 12 16
CONECT 12 13 27
CONECT 13 14 28
CONECT 14 15
CONECT 15 16 17
CONECT 16 29
CONECT 17 18 30
CONECT 18 19 20
CONECT 20 22 31 32
ENDMDL
END
1) The XX
notation for dummy atoms breaks the RDKit PDB parser. Here, running both Chem.MolFromPDBFile("tmp_images/m0.pdb", sanitize=True)
and Chem.MolFromPDBFile("tmp_images/m0.pdb", sanitize=False)
return None
. When I substitute XX
with At
, only sanitize=False
returns a molecule:
Generating a 2D depiction doesn't work that well though:
which brings me to:
2) no CONECT records are being written out for the dummy atoms, which means that in this RDKit is guessing bonds (and clearly not doing a good job). For instance using obabel to convert to SDF: obabel -ipdb tmp_images/m0.pdb -O tmp_images/m0.sdf
, we can see the disconnected dummy atoms more clearly:
####################
I think issue 1 should probably be resolved by the user, i.e. substitute XX
with At
(as XX
is intended behaviour, presumably), but writing the CONECTS in issue 2 would help a lot towards depicting a merged molecule.
Okay, fixed it. The issue was that I had updated the way we write PDB files within BioSimSpace to inject CONECT records. To do this, I use
Sire.Mol.Connectivity
. However, I hadn't passed through the property map fromsaveMolecules
, so it couldn't extract the molecular coordinates in order to work out the connectivity matrix. I'll just check that my fix doesn't break anything, then push.
I've not had the time to check whether this actually fixes issue 2, would you expect it to?
No, this won't fix issue 2, since I assume that the Sire connectivity code is ignoring dummy atoms. It uses the element information to work out whether atoms are bonded, and it won't know what element to use for a dummy. (It uses a "bond-hunter" rather than the bonding info from the potential.) I might be able to do some hack where I temporarily replace the dummy atoms with their actual elements, i.e. what they'd be in the other state.
For the first issue, I'm actually not sure what the element should be for a dummy in a PDB file? If there is a standard, then let me know and I can update the parser. It's just writing a string representation of the Sire.Element
object (with some extra stuff for formal charge, if present) so I can just test for dummy and write something else if needed.
It looks like RDKit handles * for dummy.
yes, they do use *
for dummies but I'm not sure what happens if you have a *-*
bond, because they use dummies for other purposes (in fragmentisation to denote attachment points, mainly)
FWIW, substituting carbons in a PDB file with *
will make the RDkit PDB file reader return None (both with and without sanitization) without jumping through some hoops first.
For an input PDB file with dummies denoted with " *" (note the space):
COMPND lig_ejm_42
AUTHOR GENERATED BY OPEN BABEL 2.3.2
HETATM 1 C LIG 1 -1.960 21.550 -27.330 1.00 0.00 C
HETATM 2 C LIG 1 -1.290 22.410 -28.200 1.00 0.00 C
HETATM 3 C LIG 1 -1.960 22.950 -29.290 1.00 0.00 C
HETATM 4 C LIG 1 -3.300 22.620 -29.540 1.00 0.00 C
HETATM 5 C LIG 1 -3.970 21.760 -28.650 1.00 0.00 C
HETATM 6 C LIG 1 -3.290 21.230 -27.560 1.00 0.00 C
HETATM 7 CL LIG 1 -5.640 21.350 -28.920 1.00 0.00 Cl
HETATM 8 C LIG 1 -4.040 23.150 -30.710 1.00 0.00 C
HETATM 9 O LIG 1 -4.230 22.420 -31.670 1.00 0.00 O
HETATM 10 N LIG 1 -4.510 24.410 -30.650 1.00 0.00 N
HETATM 11 C LIG 1 -5.290 25.110 -31.580 1.00 0.00 C
HETATM 12 C LIG 1 -5.960 24.550 -32.680 1.00 0.00 C
HETATM 13 C LIG 1 -6.780 25.360 -33.460 1.00 0.00 C
HETATM 14 N LIG 1 -6.920 26.650 -33.180 1.00 0.00 N
HETATM 15 C LIG 1 -6.300 27.230 -32.160 1.00 0.00 C
HETATM 16 C LIG 1 -5.510 26.460 -31.310 1.00 0.00 C
HETATM 17 N LIG 1 -6.540 28.580 -31.880 1.00 0.00 N
HETATM 18 C LIG 1 -5.710 29.450 -31.230 1.00 0.00 C
HETATM 19 O LIG 1 -4.620 29.120 -30.810 1.00 0.00 O
HETATM 20 C LIG 1 -6.230 30.850 -31.030 1.00 0.00 C
HETATM 21 CL LIG 1 -1.110 24.050 -30.330 1.00 0.00 Cl
HETATM 22 H LIG 1 -6.760 31.250 -31.890 1.00 0.00 H
HETATM 23 H LIG 1 -1.440 21.130 -26.480 1.00 0.00 H
HETATM 24 H LIG 1 -0.260 22.660 -28.020 1.00 0.00 H
HETATM 25 H LIG 1 -3.810 20.560 -26.890 1.00 0.00 H
HETATM 26 H LIG 1 -4.320 24.980 -29.840 1.00 0.00 H
HETATM 27 H LIG 1 -5.840 23.500 -32.910 1.00 0.00 H
HETATM 28 H LIG 1 -7.310 24.940 -34.300 1.00 0.00 H
HETATM 29 H LIG 1 -5.060 26.910 -30.430 1.00 0.00 H
HETATM 30 H LIG 1 -7.400 29.010 -32.180 1.00 0.00 H
HETATM 31 * LIG 1 -5.090 31.750 -30.680 1.00 0.00 *
HETATM 32 H LIG 1 -6.890 30.870 -30.160 1.00 0.00 H
HETATM 33 * LIG 1 -5.460 32.770 -30.540 1.00 0.00 *
HETATM 34 * LIG 1 -4.610 31.410 -29.770 1.00 0.00 *
HETATM 35 * LIG 1 -4.360 31.750 -31.490 1.00 0.00 *
CONECT 1 2 6 23
CONECT 2 3 24 1
CONECT 3 21 4 2
CONECT 4 8 3 5
CONECT 5 4 7 6
CONECT 6 5 1 25
CONECT 7 5
CONECT 8 9 10 4
CONECT 9 8
CONECT 10 11 8 26
CONECT 11 12 16 10
CONECT 12 13 27 11
CONECT 13 28 14 12
CONECT 14 13 15
CONECT 15 14 17 16
CONECT 16 15 11 29
CONECT 17 30 15 18
CONECT 18 17 20 19
CONECT 19 18
CONECT 20 22 18 31 32
CONECT 20
CONECT 21 3
CONECT 22 20
CONECT 23 1
CONECT 24 2
CONECT 25 6
CONECT 26 10
CONECT 27 12
CONECT 28 13
CONECT 29 16
CONECT 30 17
CONECT 31 35 20 33 34
CONECT 31
CONECT 32 20
CONECT 33 31
CONECT 34 31
CONECT 35 31
MASTER 0 0 0 0 0 0 0 0 35 0 35 0
END
Following paolo's gist we can visualise dummy atoms with the following lines:
So, it looks like this should work; might even be worth implementing in BSS.Align()
? We just need to figure out a way to write CONECT records for the dummy atoms.
Okay, that's great. Writing CONECT records will be easy. As mentioned, I'll just copy the element from the opposite state when computing the connectivity (creating a new molecule object). I'll do this later today if I get a chance. Could we just stick with Xx
for dummies and perform a replacement on that instead? That way I won't need to modify the PDB parser.
yes! Just tested, that works too. Of course the parser has to be changed to:
mol_from_pdb = Chem.MolFromPDBBlock(lig_42_pdb_block.replace("XX", "AT"),
sanitize=False, proximityBonding=False)
If we decide to go for that.
Annoyingly using the element at the opposite end state doesn't work, since the bond length is non-standard so it doesn't get picked up. I'm going to need to manually create the connect record from the bond terms in the potential instead.
Okay, that works. Let me know how you get on.
thanks for the implement - it seems that there might be something off with how the dummies are written, still:
MODEL 1
ATOM 1 C LIG 1 -1.999 21.529 -27.316 1.00 0.00 C
ATOM 2 C LIG 1 -1.316 22.398 -28.167 1.00 0.00 C
ATOM 3 C LIG 1 -1.969 22.948 -29.263 1.00 0.00 C
ATOM 4 C LIG 1 -3.305 22.619 -29.537 1.00 0.00 C
ATOM 5 C LIG 1 -3.988 21.750 -28.667 1.00 0.00 C
ATOM 6 C LIG 1 -3.325 21.210 -27.571 1.00 0.00 C
ATOM 7 CL LIG 1 -5.653 21.341 -28.967 1.00 0.00 CL
ATOM 8 C LIG 1 -4.026 23.160 -30.714 1.00 0.00 C
ATOM 9 O LIG 1 -4.200 22.439 -31.684 1.00 0.00 O
ATOM 10 N LIG 1 -4.499 24.418 -30.650 1.00 0.00 N
ATOM 11 C LIG 1 -5.265 25.127 -31.585 1.00 0.00 C
ATOM 12 C LIG 1 -5.916 24.576 -32.701 1.00 0.00 C
ATOM 13 C LIG 1 -6.724 25.393 -33.487 1.00 0.00 C
ATOM 14 N LIG 1 -6.870 26.680 -33.197 1.00 0.00 N
ATOM 15 C LIG 1 -6.268 27.251 -32.161 1.00 0.00 C
ATOM 16 C LIG 1 -5.490 26.474 -31.306 1.00 0.00 C
ATOM 17 N LIG 1 -6.514 28.598 -31.872 1.00 0.00 N
ATOM 18 C LIG 1 -5.695 29.463 -31.201 1.00 0.00 C
ATOM 19 O LIG 1 -4.612 29.130 -30.766 1.00 0.00 O
ATOM 20 C LIG 1 -6.220 30.860 -30.996 1.00 0.00 C
ATOM 21 CL LIG 1 -1.104 24.059 -30.278 1.00 0.00 CL
ATOM 22 H LIG 1 -7.298 30.950 -31.142 1.00 0.00 H
ATOM 23 H LIG 1 -1.493 21.102 -26.462 1.00 0.00 H
ATOM 24 H LIG 1 -0.290 22.648 -27.968 1.00 0.00 H
ATOM 25 H LIG 1 -3.855 20.533 -26.916 1.00 0.00 H
ATOM 26 H LIG 1 -4.323 24.981 -29.831 1.00 0.00 H
ATOM 27 H LIG 1 -5.791 23.529 -32.939 1.00 0.00 H
ATOM 28 H LIG 1 -7.240 24.981 -34.339 1.00 0.00 H
ATOM 29 H LIG 1 -5.055 26.916 -30.415 1.00 0.00 H
ATOM 30 H LIG 1 -7.369 29.030 -32.182 1.00 0.00 H
ATOM 31 H LIG 1 -5.700 31.537 -31.671 1.00 0.00 H
ATOM 32 H LIG 1 -5.966 31.191 -29.998 1.00 0.00 H
ATOM 33 H LIG 1 -5.460 32.770 -30.540 1.00 0.00 XX
ATOM 34 H LIG 1 -4.610 31.410 -29.770 1.00 0.00 XX
ATOM 35 H LIG 1 -4.360 31.750 -31.490 1.00 0.00 XX
CONECT 1 2 6 23
CONECT 2 3 24
CONECT 3 4 21
CONECT 4 5 8
CONECT 5 6 7
CONECT 6 25
CONECT 8 9 10
CONECT 10 11 26
CONECT 11 12 16
CONECT 12 13 27
CONECT 13 14 28
CONECT 14 15
CONECT 15 16 17
CONECT 16 29
CONECT 17 18 30
CONECT 18 19 20
CONECT 20 22 31 32
CONECT 31 33 34 35
ENDMDL
END
or perhaps this is one of the cases where there is an indexing mismatch between RDKit and BSS?
Sorry, what's the issue here, just the way that RDKit is placing the dummies in the visualisation? The CONECT looks okay to me, in at least everything is in ascending numerical order with no duplicates (which few programs actually do). Is RDKit just ignoring the CONECT for the AT atoms? What happens if you turn proximityBonding
on?
My bad, I'd gotten confused into thinking the dummy atoms should be attached to C:19, not C:30. So yes, this works fine although RDKit will of course complain about the hydrogen valencies. Behaviour is as expected:
To answer your question, setting proximityBonding
to True lets RDKit come up with loads of bonds (AT is rather large):
Closing this issue for now. I'll reopen it if I run into a similar use-case again, then I might PR a viz implementation.
Phew, glad it's working. For the life of me I couldn't work out what was wrong and was worried that I'd completely misinterpreted your problem.
Yes, a PR would be much welcome. Something along the lines of drawMapping would be good. The function could take a merged molecule (checking that it is merged with .isMerged()
). The user could also pass is_lambda1=True
to visualise the lambda=1 state instead. (This approach is used elsewhere in the code.)
I'm trying to visualise a merged molecule (i.e. related to https://github.com/michellab/BioSimSpace/issues/209) with RDKit. For this, I need to write a merged molecule to file, then load with RDKit. There is an example available in the playground that shows how to save the lambda 0 state, however this seems to be broken at the moment.
Using the input ligands from https://github.com/michellab/BioSimSpace/issues/209:
triggers an
OSError: Failed to save system to format: 'PDB'
.Writing to
MOL2
works, however 1) this can't be loaded by RDKit and 2) the written file doesn't seem quite right as atom types are all set toDu
: