kate-fie / syndirella

Generates and scores synthetically practical elaborations from fragment screens
https://syndirella.readthedocs.io/en/latest/
4 stars 0 forks source link

Specific exit vector expansion functionality #16

Open kate-fie opened 7 months ago

kate-fie commented 7 months ago

The pocket size and shape is not considered at all for elaborations. Syndirella could have directed elaborations off identified points on compound that point to a space in the pocket for elaborations. If the expansion point cannot be synthetically elaborated if it is an inaccessible atom, that atom could be exchanged to be an 'elaborateable' atom.

There are definitley tools out there that already do this and Steph has done this for the Fragment Network. The question is if this is worth the time to do.

kate-fie commented 2 months ago

Impact: Reduce the overall number of placements by not placing elaborations that extend into solvent. Problem: Atom mapping from base compound (where atoms will be labeled by PoseButcher) to elaboration.

Possible methods (ordered by likelihood of succeeding):

  1. Map from base to original reactant via RXNMapper (atom indicies are known exactly by the reaction transformation). Then map from original reactant to reactant superstuctures via Kartograf
  2. Map from base to original reactant via Kartograf. Then map from original reactant to reactant superstuctures via Kartograf. Already seeing that mapping just from base to original reactant is hard when many atoms from reactant aren't found in base (like with a boc protected reactant)
  3. Map from base to elaboration via Kartofraf. Cons: Will have to compute elaborations from all of reactants which can be huge.

https://github.com/kate-fie/syndirella/tree/f700ba98971e4cd20f5fcd2d5181f720a1f7c435/syndirella/tests/vectors

mwinokan commented 2 months ago

Correct me if wrong, but you are already doing the con of 3 at the moment, so saving on placements is still a net win? Obviously don't bother if 1 is the way to go.

kate-fie commented 2 months ago

I've started to experiment with

Option 1: Map from base to original reactant via RXNMapper (atom indicies are known exactly by the reaction transformation). Then map from original reactant to reactant superstuctures via Kartograf.

What I've figured out:

Basic code to get mapping:

random_seed = 42
# run kartograf with SMARTS mol
orig_mol_noh = Chem.MolFromSmarts(reaction['3_r1_smarts'])
Chem.SanitizeMol(orig_mol_noh)
orig_mol = Chem.AddHs(orig_mol_noh, addCoords=True)
elab_mol = Chem.AddHs(Chem.MolFromSmiles(r1_elabs[20]), addCoords=True)
# make mapping of first superstructure to original reactant
Chem.rdDistGeom.EmbedMolecule(orig_mol, useRandomCoords=False, randomSeed=random_seed)
orig_smc = SmallMoleculeComponent.from_rdkit(orig_mol)
Chem.rdDistGeom.EmbedMolecule(elab_mol, useRandomCoords=False, randomSeed=random_seed)
elab_smc = SmallMoleculeComponent.from_rdkit(elab_mol)
# Align superstructure to original reactant compound
elab_alignOrig = align_mol_shape(elab_smc, ref_mol=orig_smc)
# Get  mapping
mapper = KartografAtomMapper(atom_map_hydrogens=True, atom_max_distance=0.95)
kartograf_mapping = next(mapper.suggest_mappings(orig_smc, elab_alignOrig))
kartograf_mapping

Bad mapping where not all of the atoms in original reactant (left) are able to be mapped to super reactant (right): image

Good mapping where nearly all atoms in original reactant are mapped to super reactant (although oxygens in carboxylic acid are switched which shouldn't be a huge problem): image

Problems to be aware of/Things I need to check:

How do I validate the mappings?: