Closed naglepuff closed 1 month ago
Thank you for implementing this!
Adding @pkalita-lbl, @shreddd, and @sierra-moxon as reviewers as I will be OOO until 12pm PT.
Once reviewed, I am comfortable with this PR being merged into main
even though we have a release scheduled for early next week, given that team members have discussed the creation of this index over the past day or so (and have experimented with such an index, although they didn't create it via Alembic).
Fix #1410
Problem
Biosample search is really. slow. Its response contains not only biosample-level information, but also related data generations, workflow runs, and data objects. There are several improvements that can be made including speeding up existing queries, reducing the total number of queries, etc.
Changes
This PR adds an index on the
data_object_id
column for tablebulk_download_data_object
. This speeds up the subqueries to get download statistics for data objects during biosample search.Testing
You can verify that migration created the correct index by connecting to your local database and running
\d bulk_download_data_object
(this shows information about a table, including its indices). If you don't see the new index, try runningdocker compose run backend nmdc-server migrate
anddocker compose run backend nmdc-server migrate --ingest-db
.In your local development, run some biosample searches (from swagger, from the data portal proper, from cURL, whatever). After switching to this branch and running the migration, you will likely see an increase in speed for these queries, depending on how big your
bulk_download_data_object
table is.