e.g., searching for CCOC(=O)c1sc(C)nc1Oc1ccc(N)c(C(F)(F)F)c1 gives one result from the processed USPTO, whereas searching CC=1SC(=C(N1)OC1=CC(=C(C=C1)N)C(F)(F)F)C(=O)OCC gives one result from the raw USPTO. Using a similarity search with a threshold of 1.0 returns both entries.
Since we don't canonicalize SMILES in the original entries, perhaps the "exact" match is actually not a useful way of searching
e.g., searching for
CCOC(=O)c1sc(C)nc1Oc1ccc(N)c(C(F)(F)F)c1
gives one result from the processed USPTO, whereas searchingCC=1SC(=C(N1)OC1=CC(=C(C=C1)N)C(F)(F)F)C(=O)OCC
gives one result from the raw USPTO. Using a similarity search with a threshold of 1.0 returns both entries.Since we don't canonicalize SMILES in the original entries, perhaps the "exact" match is actually not a useful way of searching