open-reaction-database / ord-interface

Search/browse interface and APIs for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
18 stars 9 forks source link

broken SMARTS search #117

Open qai222 opened 3 weeks ago

qai222 commented 3 weeks ago

No records found for the following queries:

  1. CC coupling [C:1].[C:2]>>[C:1][C:2]
  2. A reactant in this reaction: [Br:1][C:2]1[CH:9]=[CH:8][C:5]([CH2:6]Br)=[CH:4][CH:3]=1>>
  3. A common reactant C(N(CC)CC)C
  4. A common solvent C1COCC1

The only successful case is an entire reaction smiles from an existing record, and even this does not always work.

skearnes commented 3 weeks ago

Verified; assigning to @skearnes

qai222 commented 3 weeks ago

FWIW the pgsql query seems working properly, here is what I used (somehow I had to add a prefix to table names, reaction -> ord.reaction)

SELECT DISTINCT reaction.reaction_id, reaction.proto, reaction.rdkit_reaction_id
FROM ord.reaction
JOIN rdkit.reactions ON rdkit.reactions.id = ord.reaction.rdkit_reaction_id
WHERE rdkit.reactions.reaction @> reaction_from_smarts(%s)

And a few tests

# reaction_smarts = "C"  # no hit
# reaction_smarts = ">>"  # no hit
# reaction_smarts = "C.C>>C-C"  # works
# reaction_smarts = "[#6:1].[#6:2]>>[#6:1]-[#6:2]"  # works
# reaction_smarts = "[CH3:1][O:2][CH2:3][C:4](=[O:5])[OH:6]"  # no hit, existing reactant
# reaction_smarts = "[CH3:1][O:2][CH2:3][C:4](=[O:5])[OH:6]>>"  # works, existing reactant
# reaction_smarts = "[Na]>>"  # works, return [Na+]
# reaction_smarts = "[Cu+]>>"  # works, no Cu2+
# reaction_smarts = ">>[F-]"  # works
# reaction_smarts = "C>>" # works
# reaction_smarts = "C>>C"  # works
# reaction_smarts = ">O1CCCC1>"  # works
# reaction_smarts = "[#6:1]>>[#6:1]"  # works
# reaction_smarts = "[#6:1]>>[#6]"  # works
qai222 commented 3 weeks ago

I didn't find an example to run substructure searches using ORM, I'd appreciate a pointer.

skearnes commented 2 weeks ago

I think this is likely a server timeout issue; running some queries locally takes a quite a while.

skearnes commented 2 weeks ago

I didn't find an example to run substructure searches using ORM, I'd appreciate a pointer.

https://github.com/open-reaction-database/ord-schema/pull/722

qai222 commented 2 weeks ago

I didn't find an example to run substructure searches using ORM, I'd appreciate a pointer.

open-reaction-database/ord-schema#722

Thanks!

I think this is likely a server timeout issue; running some queries locally takes a quite a while.

I was running them via the interface with limit set to 5. On my local database most of them take < 10s.

qai222 commented 2 weeks ago

Based on this from search.py, the limit was only applied after running all queries. So it maybe timeout after all: with LIMIT explicitly in SQL query I may have avoided a full table scan, but a query sent via the interface always does a full table scan.

skearnes commented 2 weeks ago

The new API I'm working on fixes this problem by always sending only one query that combines all the various criteria.

On Sat, Jun 15, 2024, 11:13 PM Qianxiang Ai @.***> wrote:

Based on this from search.py https://github.com/open-reaction-database/ord-interface/blob/bd9ff55a7d9aa6f3161ab85e5fcc92c75973c311/ord_interface/client/search.py#L71, the limit was only applied after running all queries. So it maybe timeout after all: with LIMIT explicitly in SQL query I may have avoided a full table scan, but a query sent via the interface always does a full table scan.

— Reply to this email directly, view it on GitHub https://github.com/open-reaction-database/ord-interface/issues/117#issuecomment-2171027653, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHITGJ4WS3II5VN34HUNWTZHT7EFAVCNFSM6AAAAABJHFSJUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRGAZDONRVGM . You are receiving this because you were mentioned.Message ID: @.***>