open-reaction-database / ord-schema

Schema for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
92 stars 26 forks source link

Unique entries in rdkit tables #695

Closed skearnes closed 11 months ago

skearnes commented 11 months ago

There are a ton of duplicated entries in the rdkit tables; this slows down dataset ingestion and searching.

Note that this changes the search logic: now you must query the rdkit tables and then query the matched smiles in the ORD tables (instead of joining those tables directly).

codecov[bot] commented 11 months ago

Codecov Report

Merging #695 (e71ceb6) into main (d33da6e) will decrease coverage by 0.23%. Report is 1 commits behind head on main. The diff coverage is 79.16%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #695      +/-   ##
==========================================
- Coverage   69.56%   69.33%   -0.23%     
==========================================
  Files          23       23              
  Lines        2326     2322       -4     
  Branches      585      589       +4     
==========================================
- Hits         1618     1610       -8     
  Misses        598      598              
- Partials      110      114       +4     
Files Changed Coverage Δ
ord_schema/orm/scripts/add_datasets.py 49.01% <42.85%> (-5.70%) :arrow_down:
ord_schema/message_helpers.py 85.57% <60.00%> (-0.68%) :arrow_down:
ord_schema/orm/conftest.py 95.83% <100.00%> (ø)
ord_schema/orm/database.py 93.24% <100.00%> (+0.48%) :arrow_up:
ord_schema/orm/mappers.py 96.89% <100.00%> (+1.16%) :arrow_up:
ord_schema/orm/rdkit_mappers.py 91.46% <100.00%> (-1.17%) :arrow_down: