open-reaction-database / ord-schema

Schema for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
93 stars 26 forks source link

Faster rdkit operations in the ORM #733

Closed skearnes closed 2 months ago

skearnes commented 2 months ago

Rewrites the RDKit table operations to use subqueries and combined queries. This makes each dataset faster while also making it much easier to rerun the add_datasets.py script without redoing work.

skearnes commented 2 months ago

@bdeadman @qai222 With these changes I was able to do a full reload of the database in about 9 hours (instead of 40+ hours before). Running again now that I've updated the queries to avoid temporary tables.