populationgenomics / variant-curation-portal

Web application for curating loss of function variants
MIT License
1 stars 0 forks source link

Update project results export with better prefetching #72

Closed EddieLF closed 4 months ago

EddieLF commented 4 months ago

This PR improves the result fetching when exporting results as JSON and CSV. Resolves https://github.com/populationgenomics/variant-curation-portal/issues/71.

Exporting results using either of these methods is vastly improved, however the CSV export seems much more optimized than the JSON export.


JSON export

For a project with 200 results

In get_json_response, we prefetch all custom flags relating to the CurationResults that are in the assignments queryset. This improves the response time by eliminating all repetitive queries of the form:

SELECT "custom_flag"."id", "custom_flag"."created_at", "custom_flag"."updated_at", "custom_flag"."key", "custom_flag"."label", "custom_flag"."shortcut" FROM "custom_flag" WHERE "custom_flag"."id" = 1; args=(1,)
SELECT "custom_flag"."id", "custom_flag"."created_at", "custom_flag"."updated_at", "custom_flag"."key", "custom_flag"."label", "custom_flag"."shortcut" FROM "custom_flag" WHERE "custom_flag"."id" = 2; args=(2,)
...
SELECT "custom_flag"."id", "custom_flag"."created_at", "custom_flag"."updated_at", "custom_flag"."key", "custom_flag"."label", "custom_flag"."shortcut" FROM "custom_flag" WHERE "custom_flag"."id" = 8; args=(8,)

There are still a lot of repetitive queries like:

SELECT ••• FROM "curation_portal_user" WHERE "curation_portal_user"."id" = 1
 200 similar queries.  Duplicated 200 times.

I tried a few ways to prefetch the user IDs but none really had any impact on reducing these queries.

CSV Export

For a project with 200 results

In get_csv_response, the improvement is much more significant. We define the custom_flag_results as a dict for each assignment, rather than calling getattr to check each flag result for each assignment in the list comprehension. These are the queries that get run:

SELECT ••• FROM "curation_project" WHERE "curation_project"."id" = 1 
SELECT ••• FROM "curation_portal_user" INNER JOIN "curation_project_owners" ON ("curation_portal_user"."id" = "curation_project_owners"."user_id") WHERE ("curation_project_owners"."project_id" = 1 AND "curation_portal_user"."id" = 1) LIMIT 1
SELECT ••• FROM "curation_portal_user" INNER JOIN "curation_project_owners" ON ("curation_portal_user"."id" = "curation_project_owners"."user_id") WHERE ("curation_project_owners"."project_id" = 1 AND "curation_portal_user"."id" = 1) LIMIT 1
SELECT ••• FROM "custom_flag"
SELECT ••• FROM "curation_assignment" INNER JOIN "curation_result" ON ("curation_assignment"."result_id" = "curation_result"."id") INNER JOIN "curation_variant" ON ("curation_assignment"."variant_id" = "curation_variant"."id") INNER JOIN "curation_portal_user" ON ("curation_assignment"."curator_id" = "curation_portal_user"."id") WHERE ("curation_result"."verdict" IS NOT NULL AND "curation_variant"."project_id" = 1 AND "curation_portal_user"."username" = ...)
SELECT ••• FROM "curation_variant_annotation" WHERE "curation_variant_annotation"."variant_id" IN (...)
SELECT ••• FROM "custom_flag_curation_result" INNER JOIN "custom_flag" ON ("custom_flag_curation_result"."flag_id" = "custom_flag"."id") WHERE "custom_flag_curation_result"."result_id" IN (...)
SELECT ••• FROM "curation_portal_user" WHERE "curation_portal_user"."id" IN (NULL)