labdao / plex

Platform for running comp bio applications on distributed compute and storage infrastructure
https://lab.bio
MIT License
55 stars 14 forks source link

LAB-1485 fix gorm queries #982

Closed supraja-968 closed 4 months ago

supraja-968 commented 4 months ago

What type of PR is this?

Description

In this PR, some critical queries (not all) in flows and checkpoints have been optimized for performance. The rationale for transitioning from GORM's 'Preload' method to using 'Joins' for querying: Performance is critical, thus reducing database latency and overhead is a priority.

  1. GORM's 'Preload' executes separate SQL queries to fetch the main record and each associated record. Here’s how it operates.

    • SQL Execution for Preloading:

      • Fetch Job:

        SELECT * FROM jobs WHERE id = $1 LIMIT 1;
      • Fetch Flow:

        SELECT * FROM flows WHERE id IN (SELECT flow_id FROM jobs WHERE id = $1);
      • Fetch Tool:

        SELECT * FROM tools WHERE id IN (SELECT tool_id FROM jobs WHERE id = $1);
    • Data Fetching: Each query results in a separate round trip to the database, increasing the total time spent in data retrieval, especially over networked database connections.

  2. Using Joins integrates the fetching of related data into a single SQL query, improving data retrieval efficiency:

Going even one step ahead,

Instead of selecting all columns with *, specifying only the columns that are necessary for the application's functionality, can reduce memory usage and increase query speed by decreasing the amount of data transferred from the database.

Explanation: This approach only fetches the id and status from jobs, the name from flows, and the type from tools, instead of all columns in these tables.

Performance Consideration

Switching to Joins from Preload in our GORM data access strategy offers a substantial improvement in performance by reducing the number of database round-trips required to fetch a Job and its related Flow and Tool. This is crucial for a performance-critical application like ours, where reducing latency and enhancing query efficiency directly impacts user experience and system scalability.

Another major slowness is coming from fetching all flows for the side nav:

image

this is changed to /flows/names endpoint to fetch only names for the side nav:

image

linear[bot] commented 4 months ago

LAB-1485 1. fix gorm queries with preloads

vercel[bot] commented 4 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment | Name | Status | Preview | Comments | Updated (UTC) | | :--- | :----- | :------ | :------- | :------ | | **docs** | ⬜️ Ignored ([Inspect](https://vercel.com/convexitylabs/docs/CZ3tZgrnGCBH1ufcxVWJwh4AbPPR)) | [Visit Preview](https://docs-git-lab-1485-fix-gorm-queries-convexitylabs.vercel.app) | | Jun 24, 2024 11:45pm |