microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
9 stars 0 forks source link

Create index on download/data object association #1411

Closed naglepuff closed 1 month ago

naglepuff commented 1 month ago

Fix #1410

Problem

Biosample search is really. slow. Its response contains not only biosample-level information, but also related data generations, workflow runs, and data objects. There are several improvements that can be made including speeding up existing queries, reducing the total number of queries, etc.

Changes

This PR adds an index on the data_object_id column for table bulk_download_data_object. This speeds up the subqueries to get download statistics for data objects during biosample search.

Testing

You can verify that migration created the correct index by connecting to your local database and running \d bulk_download_data_object (this shows information about a table, including its indices). If you don't see the new index, try running docker compose run backend nmdc-server migrate and docker compose run backend nmdc-server migrate --ingest-db.

In your local development, run some biosample searches (from swagger, from the data portal proper, from cURL, whatever). After switching to this branch and running the migration, you will likely see an increase in speed for these queries, depending on how big your bulk_download_data_object table is.

eecavanna commented 1 month ago

Thank you for implementing this!

Adding @pkalita-lbl, @shreddd, and @sierra-moxon as reviewers as I will be OOO until 12pm PT.

Once reviewed, I am comfortable with this PR being merged into main even though we have a release scheduled for early next week, given that team members have discussed the creation of this index over the past day or so (and have experimented with such an index, although they didn't create it via Alembic).