swansonk14 / SyntheMol

Combinatorial antibiotic generation
MIT License
83 stars 19 forks source link

Duplicate Reactions #7

Closed OliverBScott closed 6 months ago

OliverBScott commented 6 months ago

Hi there!

I was just browsing the code and noticed that 4 reactions in reactions/real.py are duplicates: https://github.com/swansonk14/SyntheMol/blob/main/synthemol/reactions/real.py#L16-L47. I would be suprised if this was intentional?

Thanks

swansonk14 commented 6 months ago

Hi @OliverBScott,

Thank you for noticing this! The duplicate reactions are actually intentional, although they aren't exactly duplicates. They are duplicates in the sense that our SMARTS reaction templates are exactly the same for all four reactions, but in reality, there are slight differences in terms of catalysts and experimental conditions. Although the SMARTS are the same, the building blocks that are allowed in each reaction (provided in the reaction_to_building_blocks.pkl file) differ between the reactions, so this means that SyntheMol can't generate the same set of molecules for each of the four reactions.

I hope this helps clarify things!

Best, Kyle

OliverBScott commented 6 months ago

Thanks, this clarifies it for me. One question I have though is, is this mapping between reaction->building blocks provided by Enamine. I guess this cannot be done simply through substructure matching in this case.