rdkit-rs / cheminee

The chemistry search stack
9 stars 0 forks source link

Superstructure search #87

Closed JJ-Pineda closed 3 months ago

JJ-Pineda commented 3 months ago

Description Resolves #86 by introducing superstructure search functionality and CLI and API endpoints. This means that we are searching for compounds that are substructures of the query compound (i.e. smaller than the query compound). Similar to substructure search and identity search, superstructure search uses scaffolds to speed up searches.

Important Notes

JJ-Pineda commented 3 months ago

For now, this is fast as superstructure gets. Performing a comprehensive superstructure search for paclitaxel takes ~7 seconds, but this is an atypically complex compound.

I tried to add "NOT" queries for non-matching scaffolds in an attempt to narrow down search space, but the speed up was negligible (~0.2-0.3 seconds for paclitaxel).