sanger / General-Backlog-Items

Broad bucket to collate backlog items that have no obvious repository
0 stars 0 forks source link

ST0005 As a TL I would like to verify the plates completed in the last week in the report matches a direct query against the MLWH. #393

Open TWJW-SANGER opened 5 months ago

TWJW-SANGER commented 5 months ago

As a TL I would like to verify the plates completed in the last week in the report matches a direct query against the MLWH.

Background The query described on the confluence page Useful Queries for the Multi-LIMS Warehouse under Show me the runs, lanes and tag indexes for all the samples in a study and their status can be modified to get plates completed in the last week. Use this to validate the completed plates provided in the report.

khelwood commented 4 months ago

@TWJW-SANGER What does "completed" mean here? e.g. is it the recorded_at column in iseq_flowcell?

TWJW-SANGER commented 4 months ago

Ah, probably not.

I think it is the run_status.date for run_status_dict.description='qc_complete'. For reference see the Run Life Cycle on the page MultiLims Warehouse Tutorial

khelwood commented 4 months ago

Something like this:

SELECT
    sr.labware_human_barcode,
    rsd.description,
    rs.date
FROM iseq_flowcell AS fc
JOIN iseq_product_metrics AS pm ON (fc.id_iseq_flowcell_tmp=pm.id_iseq_flowcell_tmp)
JOIN iseq_run_status rs ON (pm.id_run=rs.id_run)
JOIN iseq_run_status_dict rsd ON (rs.id_run_status_dict=rsd.id_run_status_dict)
JOIN stock_resource sr ON (fc.id_sample_tmp=sr.id_sample_tmp)
WHERE rsd.description='qc complete'
  AND rs.date >= '2024-05-27'
GROUP BY sr.labware_human_barcode
khelwood commented 4 months ago

Barcodes that seem to be missing from the view: NT1832651T, NT1832652U, FS71987047, NT8387892G, SQPP-53600-J, NT8351870H

TWJW-SANGER commented 4 months ago

Query looks good - except that I think it should be rsd.description = 'qc complete' as this is the end of the data QC process and the point when data is available to the faculty submitters in IRods, which is I think the most important time point for them.

khelwood commented 4 months ago

These barcodes: NT1832651T, NT1832652U, FS71987047, NT8387892G, SQPP-53600-J, NT8351870H

are missing from the view because their studies have not had the event 'sample_manifest.updated' recorded on them, which is required by the sample_flowcell_view.

TWJW-SANGER commented 4 months ago

Interesting... Barcodes starting with NT are Sample Tubes - presumably libraries submitted to SeqOps ? (so is FS71987047)

SQPP-53600-J I am not sure on this. Running the query for all events on a labware from _MLWHEvents 101 And Useful Queries I get

| id      | event_type              | friendly_name | Date                | associated_role_data_available_on_event                                                                                                                                                                                                                                                                                           | metadata                                                                         |
| 4612465 | sample_manifest.created | SQPP-53600-J  | 2024-05-02 12:02:59 | study=HCA Gut Paediatric Teichmann RNA                                                                                                                                                                                                                                                                                            | labware_type=plate,supplier=Sanger Institute                                     |
| 4616540 | s**ample_manifest.updated** | SQPP-53600-J  | 2024-05-08 07:49:03 | sample=PAED_F_GUT14800930,sample=PAED_F_GUT14800931,sample=PAED_F_GUT14800932,study=HCA Gut Paediatric Teichmann RNA                                                                                                                                                                                                              | labware_type=plate,supplier=Sanger Institute                                     |
khelwood commented 4 months ago

@TWJW-SANGER The sample_flowcell_view isn't looking for a sample_manifest.updated event on the labware: it's looking for one matching the study name, because it's how the view excludes data for studies that haven't been used in the last six months (/two years).

khelwood commented 4 months ago

@TWJW-SANGER After updates to the query, the only barcodes missing are NT8351870H and NT8387892G. They are for study "Illumina Controls", which has no sample_manifest.updated event. Also they both reference the same sample, which I don't think the query is written to allow.