open-reaction-database / ord-interface

Search/browse interface and APIs for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
19 stars 9 forks source link

Exact SMILES match doesn't account for canonicalization #36

Closed connorcoley closed 2 years ago

connorcoley commented 2 years ago

e.g., searching for CCOC(=O)c1sc(C)nc1Oc1ccc(N)c(C(F)(F)F)c1 gives one result from the processed USPTO, whereas searching CC=1SC(=C(N1)OC1=CC(=C(C=C1)N)C(F)(F)F)C(=O)OCC gives one result from the raw USPTO. Using a similarity search with a threshold of 1.0 returns both entries.

Since we don't canonicalize SMILES in the original entries, perhaps the "exact" match is actually not a useful way of searching

skearnes commented 2 years ago

What if we made "exact" an alias for "threshold=1.0"?

connorcoley commented 2 years ago

That seems like a perfectly reasonable solution as long as the search is relatively fast (which it seems to be)