Open mu-wang opened 2 years ago
I believe what's happening is that transform works on the variable part, but hydrogens aren't treated as the variable *[H]
but instead are treated as a special case.
If so, I don't remember if transformation from a hydrogen was deliberately not included in the "transform" operation, or if it was an oversight.
As Jérôme and Christian point out, hydrogen transformations were explicitly not included as there would be too many.
The transform
option lets you specify a specific hydrogen to consider, by denoting it with an explicit [H]
in the SMILES string.
However, that code path has not been used for years and it does not work in the main mmpdb release. (RDKit changed its wildcard representation from [*]
to *
about five years ago, and mmpdb used a hard-coded [*][H]
to recognize the cut hydrogen SMILES fragment.)
The fixed code is available in the v3 development version, available from https://github.com/adalke/mmpdb/tree/v3-dev .
Hi @adalke , thank you for your help. I will try the v3-dev
version of mmpdb.
Hello. I am restarting this thread as I have a follow up question. I am able to specify a hydrogen to consider by denoting it with an explicit [H] in the SMILES string. I also note that one can specify multiple hydrogens this way and vary them all.
However, it appears that if you specify one or more hydrogens then only the specified hydrogen position(s) are modified, with the rest of the molecule remaining unchanged. Is there a way to vary the hydrogen(s) and the rest of the molecule? Or a flag to vary all hydrogens? I understand this may generate large numbers of compounds.
Thanks.
Hi DJ,
back when we designed mmpdb for the first time, we found that the number of compounds generated can become extremely large if you allow for both H and Fragment exchanges. In fact, depending on your database, the number of compounds generated can already become really large if you allow for replacing all hydrogens. We therefore decided to only allow either exchange of explicit hydrogens or fragments that include at least one heavy atom. You can change this behaviour, but you'd have to hack the code a bit. Before doing that, I'd recommend you test whether the output is still manageable by what you want to do with it - just make all hydrogens explicit in the input molecule and see what happens.
Bests, Christian
The transform rules in mmpdblib appears to miss some apparent cases.
A test case with the following structures:
with some properties:
I performed the fragmentation, index and property loading as instructed.
The indexed pairs makes sense.
However, when I run:
I noticed that I cannot get mol2 or mol3, where the rules mol1->mol2 and mol1->mol3 is included in the index step. Did I miss something here? Thank you for your help.
Here's the explanation output: