sanger / General-Backlog-Items

Broad bucket to collate backlog items that have no obvious repository
0 stars 0 forks source link

DPL-985 Create a new view in the reporting database for sample tracking #351

Closed javidahmed64592 closed 3 months ago

javidahmed64592 commented 1 year ago

User story A view can be created to generate the sample tracking report using the view created in #350. The end result should be a view which can be used in the following way:

SELECT * FROM sample_tracking_view
WHERE study_name = 'Study name';

The returned result should be the same format as the mock report created in #342 (tab 'Mock report').

Who are the primary contacts for this story TW KT AY JA

Acceptance criteria To be considered successful the solution must allow:

Dependencies This story is blocked by the following dependencies:

References This story has a non-blocking relationship with:

Additional context Information about the report can be found in Sample Tracking Report.

This story is a part of the roadmap in Sample Tracking Proof Of Concept.

javidahmed64592 commented 12 months ago

I have added sample_tracking_view to mlwhd_reporting in UAT. The view uses a CTE block sample_events which is joined multiple times to sample_flowcell_view for the different sequencing stages. This is where the query struggles the most regarding performance so I think most optimisations will be here. These are also currently JOIN instead of LEFT JOIN. Ideally we will use LEFT JOIN so that samples that haven't gone through certain sequencing stages will still be included in the report. The results are grouped by the plate barcode.

Does what I've done make sense and what optimisations can I make to help the performance of the query?

Once optimisations have been made and the views work as expected, I'll then need help testing the output of the report. We'll need to generate reports for different studies and verify that the correct samples and plates are retrieved, the sample counts and dates for each event are correct etc.

javidahmed64592 commented 11 months ago

Now that the views have been optimised, the report is able to be generated in ~30 seconds for one study. As such, I will proceed with these views and add them to the production database.

I've added a page in Confluence outlining all the work I've done on this report and what future work can be done:

Sample Tracking Report