open-reaction-database / ord-interface

Search/browse interface and APIs for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
20 stars 11 forks source link

Problems searching products by SMILES or SMARTS #125

Open bdeadman opened 3 months ago

bdeadman commented 3 months ago

As reported by a user, and observed by me in #122, the chemical searcher is not finding the expected results.

bdeadman commented 3 months ago

Running the following on my (out of date) clone of ord-data, I get 411 records.

SELECT compound.smiles AS smiles, compound.reaction_role FROM ord.compound WHERE smiles LIKE 'OCCS' ;

Running the search in the online interface returns no results for 'Reactants & Reagents' = "OCCS" with the exact, similarity or substructure search options.

The same search with the SMARTS option returned 100 entries (limited by the query) but note that when replicating the query it now returns no results.

image

bdeadman commented 3 months ago

Running the following on my (out of date) clone of ord-data, I get 181 records.

SELECT compound.smiles AS smiles, compound.reaction_role FROM ord.compound WHERE smiles LIKE 'O=C1C=CC(=O)C=C1' ;

Running the search in the online interface returns no results for 'Reactants & Reagents' = "O=C1C=CC(=O)C=C1" with the exact, similarity or substructure search options. SMARTS search also failed to return results. This was on the production and staging instances of ord.

bdeadman commented 3 months ago

@skearnes @miori-nd who wants this one?

miori-nd commented 3 months ago

I'll look into this

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Ben Deadman @.> Sent: Tuesday, August 6, 2024 1:12:12 PM To: open-reaction-database/ord-interface @.> Cc: miori @.>; Mention @.> Subject: Re: [open-reaction-database/ord-interface] Problems searching products by SMILES or SMARTS (Issue #125)

@skearneshttps://www.google.com/url?q=https://github.com/skearnes&source=gmail-imap&ust=1723569134000000&usg=AOvVaw3DIRgGStY81yK4rZCM4pCL @miori-ndhttps://www.google.com/url?q=https://github.com/miori-nd&source=gmail-imap&ust=1723569134000000&usg=AOvVaw2wx0aNSCsUSEgtl_FrZwJJ who wants this one?

— Reply to this email directly, view it on GitHubhttps://www.google.com/url?q=https://github.com/open-reaction-database/ord-interface/issues/125%23issuecomment-2271760232&source=gmail-imap&ust=1723569134000000&usg=AOvVaw2GFvCsYhccbPDNXi-sU4a-, or unsubscribehttps://www.google.com/url?q=https://github.com/notifications/unsubscribe-auth/BFLMMVRW2CPPXESLKVZ45CTZQD7WZAVCNFSM6AAAAABMCWPQCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZRG43DAMRTGI&source=gmail-imap&ust=1723569134000000&usg=AOvVaw0MfPVe9bSVSWqlAw-vD0Me. You are receiving this because you were mentioned.Message ID: @.***>

skearnes commented 3 months ago

I looked at the product SMILES search; this is a timeout on the backend. I'll dig into the SQL query and see if I can optimize it.

skearnes commented 3 months ago

I can definitely speed up the "exact" queries. Will push up soon.

skearnes commented 3 months ago

@bdeadman I pushed the exact query fix to prod. The SMARTS patterns listed are invalid; put them in https://smarts.plus/smartsview for testing.

bdeadman commented 3 months ago

Thanks @skearnes. I was working of this resource https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html. For my future reference the correct SMARTS pattern would be "[c,C]=[c,C]" and this validates in the tester you linked. The problem was I hadn't ended the SMARTS with a node (an atom).

I'll run some tests on the prod tomorrow.

bdeadman commented 3 months ago

Testing on production interface today. Values in table represent the number of reactions returned. Results limited to 100.

Search Term Exact Similar(0.5) Substructure SMARTS
"Reactants & Reagents" =
"SCCO" 100 0 0 0
"OCCS" 100 0 0 0
"O=C1C=CC(=O)C=C1" 100 0 0 0
"C1=CC(=O)C=CC1=O" 100 0 0 0
"c1ccncc1" 0 0 100 100
"[c,C]=[c,C]" NA NA NA 100
"[c,C]#[c,C]" NA NA NA 0
"C#C" 100 0 0 100
"Products" =
"SCCO" 0 0 0 0
"OCCS" 0 0 0 0
"CC=C1C=C(OC)N=C(N)N1" 1 0 0 0
"c1ccncc1" 7 0 100 100
"[c,C]=[c,C]" NA NA NA 100
"[c,C]#[c,C]" NA NA NA 0
"C#C" 6 0 100 100
bdeadman commented 3 months ago

Outcomes from above testing: