Open kate-fie opened 2 months ago
I noticed with rxn.RunReactants
that multiple of the same products can be output. Need to check why this is the case.
@kate-fie I have implemented checks to catch most of the invalid reactions (see hippo.chem)
I hope you can help prevent the following types of duplicates:
I noticed with
rxn.RunReactants
that multiple of the same products can be output. Need to check why this is the case.
Multiple products should only be returned if the SMARTS matches more than once in the reactant, this is not the case for this N-boc deprotection:
with open('/base_path/RXN_SMARTS_CONSTANTS.json') as f:
reaction_smarts = json.load(f)
def run_reaction(rxn_smarts, mol_smiles):
rxn = Chem.rdChemReactions.ReactionFromSmarts(rxn_smarts)
mol = Chem.MolFromSmiles(mol_smiles)
display(Draw.ReactionToImage(rxn))
products = rxn.RunReactants((mol,))
display(Draw.MolToImage(mol))
if len(products) == 0:
display('No products')
else:
product_images = [Draw.MolToImage(p[0]) for p in products]
display(product_images[0])
return products
run_reaction(reaction_smarts['N-Boc_deprotection'], 'CC(C)NC(=O)OC(C)(C)C(=O)Nc1c(NC(=O)OC(C)(C)C)cnn1C')
Reactant:
Exact product returned 6 times:
Since I can't understand why this is happening with some 30 min of google searching and checking if unique atom mappings are returning, I'm just going to return the unique products with this:
def unique_molecules_by_inchi(mol_list):
seen = set() # A set to store unique InChI keys
unique_mols = [] # List to store unique molecules
for mol in mol_list:
if mol is None:
continue
# Generate the InChI key for the molecule
inchi_key = inchi.MolToInchiKey(mol)
if inchi_key not in seen:
print(inchi_key)
seen.add(inchi_key)
unique_mols.append(mol)
return unique_mols
Which when ran on this example isn't too long:
%timeit unique_molecules_by_inchi(products)
> 493 µs ± 1.58 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
If there are 20,000 reactions to perform it would take 10 s overall.
N-boc deprotection updated to be more specific.
Benzyl alcohol deprotection
Previous: [#8&X2:2]-[#6]-[c]1[c][c][c][c][c]1>>[#8&X2:2]
Updated: [#8&X2:2]-[#6&H2]-[c]1[c&H1][c&H1][c&H1][c&H1][c&H1]1>>[#8&X2:2]
N Bn deprotection (Bn = Benzyl)
Previous: [#7&X3:2]-[#6]-[cR1]1[cR1][cR1][cR1][cR1][cR1]1>>[#7&X3:2]
Updated: [#7&X3:2]-[#6&H2]-[c]1[c&H1][c&H1][c&H1][c&H1][c&H1]1>>[#7&X3:2]
TBS alcohol deprotection
Previous: [#6:1]-[#8:2]-[#14](-[#6])(-[#6])-[#6](-[#6])(-[#6])(-[#6])>>[#6:1]-[#8:2]
Updated: [#6:1]-[#8&H0:2]-[#14](-[#6&H3])(-[#6&H3])-[#6](-[#6&H3])(-[#6&H3])(-[#6&H3])>>[#6:1]-[#8:2]
@mwinokan has pointed out examples of synthesis route errors in this issue.
Things to check/Fix/Implement:
Inchi key uniqueness check after doing any reaction (EDIT: not priority because I'm already checking for duplicates)
I also read that products from this command are not sanitised...