Specific exit vector expansion functionality

kate-fie commented 7 months ago

The pocket size and shape is not considered at all for elaborations. Syndirella could have directed elaborations off identified points on compound that point to a space in the pocket for elaborations. If the expansion point cannot be synthetically elaborated if it is an inaccessible atom, that atom could be exchanged to be an 'elaborateable' atom.

There are definitley tools out there that already do this and Steph has done this for the Fragment Network. The question is if this is worth the time to do.

kate-fie commented 2 months ago

Impact: Reduce the overall number of placements by not placing elaborations that extend into solvent. Problem: Atom mapping from base compound (where atoms will be labeled by PoseButcher) to elaboration.

Possible methods (ordered by likelihood of succeeding):

Map from base to original reactant via RXNMapper (atom indicies are known exactly by the reaction transformation). Then map from original reactant to reactant superstuctures via Kartograf
Map from base to original reactant via Kartograf. Then map from original reactant to reactant superstuctures via Kartograf. Already seeing that mapping just from base to original reactant is hard when many atoms from reactant aren't found in base (like with a boc protected reactant)
Map from base to elaboration via Kartofraf. Cons: Will have to compute elaborations from all of reactants which can be huge.

https://github.com/kate-fie/syndirella/tree/f700ba98971e4cd20f5fcd2d5181f720a1f7c435/syndirella/tests/vectors

mwinokan commented 2 months ago

Correct me if wrong, but you are already doing the con of 3 at the moment, so saving on placements is still a net win? Obviously don't bother if 1 is the way to go.

kate-fie commented 2 months ago

I've started to experiment with

Option 1: Map from base to original reactant via RXNMapper (atom indicies are known exactly by the reaction transformation). Then map from original reactant to reactant superstuctures via Kartograf.

What I've figured out:

Using RXNMapper works to map atom numbers between reactants and product. I just have to make sure to store the atom mapped SMARTS of the product and reactant of each step just for the original route. Although it can't output explicit hydrogens in the SMARTS (which shouldn't be a problem).
Using Kartograf to map between original reactant and super reactant can be done. Although, the mapping is not guaranteed to work for each super reactant.

Basic code to get mapping:

random_seed = 42
# run kartograf with SMARTS mol
orig_mol_noh = Chem.MolFromSmarts(reaction['3_r1_smarts'])
Chem.SanitizeMol(orig_mol_noh)
orig_mol = Chem.AddHs(orig_mol_noh, addCoords=True)
elab_mol = Chem.AddHs(Chem.MolFromSmiles(r1_elabs[20]), addCoords=True)
# make mapping of first superstructure to original reactant
Chem.rdDistGeom.EmbedMolecule(orig_mol, useRandomCoords=False, randomSeed=random_seed)
orig_smc = SmallMoleculeComponent.from_rdkit(orig_mol)
Chem.rdDistGeom.EmbedMolecule(elab_mol, useRandomCoords=False, randomSeed=random_seed)
elab_smc = SmallMoleculeComponent.from_rdkit(elab_mol)
# Align superstructure to original reactant compound
elab_alignOrig = align_mol_shape(elab_smc, ref_mol=orig_smc)
# Get  mapping
mapper = KartografAtomMapper(atom_map_hydrogens=True, atom_max_distance=0.95)
kartograf_mapping = next(mapper.suggest_mappings(orig_smc, elab_alignOrig))
kartograf_mapping

Bad mapping where not all of the atoms in original reactant (left) are able to be mapped to super reactant (right):

Good mapping where nearly all atoms in original reactant are mapped to super reactant (although oxygens in carboxylic acid are switched which shouldn't be a huge problem):

Problems to be aware of/Things I need to check:

Since I have to use Chem.AddHs to the mol created from the mapped SMARTS and the kartograf map is made to map between atom indices of each mol, I have to create an internal mapping for the atom.GetIdx() to atom.GetAtomMapNum() for original reactant. I need to check that this actually works as expected.

How do I validate the mappings?:

I can't spend an hour to thoroughly check each step in a route.
Best option is to get a list of labeled good vs bad elaborations and check by eye they all make sense.

kate-fie / syndirella

Specific exit vector expansion functionality #16