kate-fie / syndirella

Generates and scores synthetically practical elaborations from fragment screens
https://syndirella.readthedocs.io/en/latest/
4 stars 0 forks source link

Errors in reactions #52

Open kate-fie opened 2 months ago

kate-fie commented 2 months ago

@mwinokan has pointed out examples of synthesis route errors in this issue.

Things to check/Fix/Implement:

Inchi key uniqueness check after doing any reaction (EDIT: not priority because I'm already checking for duplicates)

I also read that products from this command are not sanitised...

kate-fie commented 2 months ago

I noticed with rxn.RunReactants that multiple of the same products can be output. Need to check why this is the case.

mwinokan commented 2 months ago

@kate-fie I have implemented checks to catch most of the invalid reactions (see hippo.chem)

I hope you can help prevent the following types of duplicates:

Screenshot 2024-04-30 at 21 57 59
kate-fie commented 2 months ago

I noticed with rxn.RunReactants that multiple of the same products can be output. Need to check why this is the case.

Multiple products should only be returned if the SMARTS matches more than once in the reactant, this is not the case for this N-boc deprotection:

with open('/base_path/RXN_SMARTS_CONSTANTS.json') as f:
    reaction_smarts = json.load(f)

def run_reaction(rxn_smarts, mol_smiles):
    rxn = Chem.rdChemReactions.ReactionFromSmarts(rxn_smarts)
    mol = Chem.MolFromSmiles(mol_smiles)
    display(Draw.ReactionToImage(rxn))
    products = rxn.RunReactants((mol,))
    display(Draw.MolToImage(mol))
    if len(products) == 0:
        display('No products')
    else:
        product_images = [Draw.MolToImage(p[0]) for p in products]
        display(product_images[0])
    return products

run_reaction(reaction_smarts['N-Boc_deprotection'], 'CC(C)NC(=O)OC(C)(C)C(=O)Nc1c(NC(=O)OC(C)(C)C)cnn1C')

image

Reactant: image

Exact product returned 6 times: image

Since I can't understand why this is happening with some 30 min of google searching and checking if unique atom mappings are returning, I'm just going to return the unique products with this:

def unique_molecules_by_inchi(mol_list):
    seen = set()  # A set to store unique InChI keys
    unique_mols = []  # List to store unique molecules
    for mol in mol_list:
        if mol is None:
            continue
        # Generate the InChI key for the molecule
        inchi_key = inchi.MolToInchiKey(mol)
        if inchi_key not in seen:
            print(inchi_key)
            seen.add(inchi_key)
            unique_mols.append(mol)
    return unique_mols

Which when ran on this example isn't too long:

%timeit unique_molecules_by_inchi(products)

> 493 µs ± 1.58 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

If there are 20,000 reactions to perform it would take 10 s overall.

kate-fie commented 2 months ago

N-boc deprotection updated to be more specific.

image

kate-fie commented 2 months ago

Benzyl alcohol deprotection

Previous: [#8&X2:2]-[#6]-[c]1[c][c][c][c][c]1>>[#8&X2:2] image

Updated: [#8&X2:2]-[#6&H2]-[c]1[c&H1][c&H1][c&H1][c&H1][c&H1]1>>[#8&X2:2] image

kate-fie commented 2 months ago

N Bn deprotection (Bn = Benzyl)

Previous: [#7&X3:2]-[#6]-[cR1]1[cR1][cR1][cR1][cR1][cR1]1>>[#7&X3:2] image

Updated: [#7&X3:2]-[#6&H2]-[c]1[c&H1][c&H1][c&H1][c&H1][c&H1]1>>[#7&X3:2] image

kate-fie commented 2 months ago

TBS alcohol deprotection

Previous: [#6:1]-[#8:2]-[#14](-[#6])(-[#6])-[#6](-[#6])(-[#6])(-[#6])>>[#6:1]-[#8:2] image

Updated: [#6:1]-[#8&H0:2]-[#14](-[#6&H3])(-[#6&H3])-[#6](-[#6&H3])(-[#6&H3])(-[#6&H3])>>[#6:1]-[#8:2] image