Some tweaks to the CustomFlagCurationResultSerializer mean that the list of CustomFlag.objects.all() is defined at the top of the serializer so it can be accessed in a nicer way with self.custom_flags.
In the ExportProjectResultsView API view, prefetch the related custom flags and users for each CurationResult in the filtered CurationAssignment queryset. The queryset with prefetching can then be passed to the get_json_response and get_csv_response functions to reduce the number of queries executed and the time taken to fetch the results.
Exporting results using either of these methods is vastly improved, however the CSV export seems much more optimized than the JSON export.
JSON export
For a project with 200 results
Runtime: ~20s -> 4s
Number of Queries: ~2500 -> 600
In get_json_response, we prefetch all custom flags relating to the CurationResults that are in the assignments queryset. This improves the response time by eliminating all repetitive queries of the form:
SELECT "custom_flag"."id", "custom_flag"."created_at", "custom_flag"."updated_at", "custom_flag"."key", "custom_flag"."label", "custom_flag"."shortcut" FROM "custom_flag" WHERE "custom_flag"."id" = 1; args=(1,)
SELECT "custom_flag"."id", "custom_flag"."created_at", "custom_flag"."updated_at", "custom_flag"."key", "custom_flag"."label", "custom_flag"."shortcut" FROM "custom_flag" WHERE "custom_flag"."id" = 2; args=(2,)
...
SELECT "custom_flag"."id", "custom_flag"."created_at", "custom_flag"."updated_at", "custom_flag"."key", "custom_flag"."label", "custom_flag"."shortcut" FROM "custom_flag" WHERE "custom_flag"."id" = 8; args=(8,)
There are still a lot of repetitive queries like:
SELECT ••• FROM "curation_portal_user" WHERE "curation_portal_user"."id" = 1
200 similar queries. Duplicated 200 times.
I tried a few ways to prefetch the user IDs but none really had any impact on reducing these queries.
CSV Export
For a project with 200 results
Runtime: ~20s -> 0.2ms
Number of Queries: ~2500 -> 8
In get_csv_response, the improvement is much more significant. We define the custom_flag_results as a dict for each assignment, rather than calling getattr to check each flag result for each assignment in the list comprehension.
These are the queries that get run:
SELECT ••• FROM "curation_project" WHERE "curation_project"."id" = 1
SELECT ••• FROM "curation_portal_user" INNER JOIN "curation_project_owners" ON ("curation_portal_user"."id" = "curation_project_owners"."user_id") WHERE ("curation_project_owners"."project_id" = 1 AND "curation_portal_user"."id" = 1) LIMIT 1
SELECT ••• FROM "curation_portal_user" INNER JOIN "curation_project_owners" ON ("curation_portal_user"."id" = "curation_project_owners"."user_id") WHERE ("curation_project_owners"."project_id" = 1 AND "curation_portal_user"."id" = 1) LIMIT 1
SELECT ••• FROM "custom_flag"
SELECT ••• FROM "curation_assignment" INNER JOIN "curation_result" ON ("curation_assignment"."result_id" = "curation_result"."id") INNER JOIN "curation_variant" ON ("curation_assignment"."variant_id" = "curation_variant"."id") INNER JOIN "curation_portal_user" ON ("curation_assignment"."curator_id" = "curation_portal_user"."id") WHERE ("curation_result"."verdict" IS NOT NULL AND "curation_variant"."project_id" = 1 AND "curation_portal_user"."username" = ...)
SELECT ••• FROM "curation_variant_annotation" WHERE "curation_variant_annotation"."variant_id" IN (...)
SELECT ••• FROM "custom_flag_curation_result" INNER JOIN "custom_flag" ON ("custom_flag_curation_result"."flag_id" = "custom_flag"."id") WHERE "custom_flag_curation_result"."result_id" IN (...)
SELECT ••• FROM "curation_portal_user" WHERE "curation_portal_user"."id" IN (NULL)
This PR improves the result fetching when exporting results as JSON and CSV. Resolves https://github.com/populationgenomics/variant-curation-portal/issues/71.
Some tweaks to the
CustomFlagCurationResultSerializer
mean that the list ofCustomFlag.objects.all()
is defined at the top of the serializer so it can be accessed in a nicer way withself.custom_flags
.In the
ExportProjectResultsView
API view, prefetch the related custom flags and users for each CurationResult in the filtered CurationAssignment queryset. The queryset with prefetching can then be passed to theget_json_response
andget_csv_response
functions to reduce the number of queries executed and the time taken to fetch the results.Exporting results using either of these methods is vastly improved, however the CSV export seems much more optimized than the JSON export.
JSON export
For a project with 200 results
In
get_json_response
, we prefetch all custom flags relating to the CurationResults that are in the assignments queryset. This improves the response time by eliminating all repetitive queries of the form:There are still a lot of repetitive queries like:
I tried a few ways to prefetch the user IDs but none really had any impact on reducing these queries.
CSV Export
For a project with 200 results
In
get_csv_response
, the improvement is much more significant. We define the custom_flag_results as a dict for each assignment, rather than callinggetattr
to check each flag result for each assignment in the list comprehension. These are the queries that get run: