sanger / traction-service

Ruby on Rails exposing a RESTful API for a Long Read LIMS
https://traction.psd.sanger.ac.uk
MIT License
4 stars 5 forks source link

DPL-695 [Traction-Service] Femto QC upload #947

Open harrietc52 opened 1 year ago

harrietc52 commented 1 year ago

Description

As Ben, I would like to upload Femto QC results and have them persisted in MLWH. So we can perform automated analysis on the samples. (Benefits described in comment below)

Who the primary contacts are for this work Ben Farr Hermione Blomfield-Smith Steve I Harriet C

Knowledge or Stake holders Other people that may have specific knowledge about this work or have a stake in how it is implemented. e.g. John Smith is an expert on x

Additional context or information Any other useful context or information that may be useful.

harrietc52 commented 1 year ago

Questions

Q: CSV currently is missing every other row. Assume empty rows can just be ignored?

Q: Column D: pg/uL or ng/uL or wanting to support both? Same for Column F: pmole/L or nmole/L.

Q: Will all CSV's have the same columns? If not, what is the full list of potential columns? Which columns are required, or not? e.g Size Threshold (b.p.) and GQN are sometimes empty. Assume these results would be ignored?

Q: "Only effective if the other important metrics are uploaded into the MLWH" (Ben F). Which other metrics? Metrics for each of the existing columns in attached CSV? Or more? (clade, yield, quality, input amounts, lysis type, powermash/cryoprep, date extraction run, date sample caught, team doing extraction…)

Q: Please could there be a Description for each of the columns

Q: There are duplicate Sample ID is the same well. Is this expected? Is this the same sample with multiple different QC values, in the same well? Or is this "Sample ID" the name of the Femto trace, of which there is multiple in one well? Maybe we could rename this field? So there would be multiple Femto traces in 1 well?

Q: Are columns C - J all the QC values? What do these result represent? (Femto traces?) Are they each an individual Femto trace, or are they as a collective, represent a femto trace?

Q: Should a row only be processed if there is a Well and Sample ID provided? What identifies a row?

Q: "Whichever direction we go we’ll require a way of standardising the way we name the femto traces. Currently a free text box is used to type the name of the sample" (Ben F). Where is this current free text input to enter the "Sample ID"/ Femto trace name? What would you prefer this to be instead of a free text field? What validation/ limitations should be on this input?

Q: (Assume this is related the the above Q.) "If we wanted to upload the results to the MLWH we’d want a way of restricting how the sample is named so we can interrogate the data effectively in tableau" (Ben F). How would they want to restrict the same name, and why? Make it unique per well? How would they interrogate the data in Tableau?

Q: "Data would need to be uploaded in a consistent format e.g. always scan the tissue tube ID (FD20706843) barcode into the femto" (Ben F slides). Is another field required in the upload CSV to include Tissue Tube ID? Tractions current implementation of QC upload requires the fields Tissue Tube ID and Sanger sample ID as these are required to create the QcResult entities. Could the Tissue Tube ID be added to the CSV? Or again, what uniquely identifies a row in the CSV.

Q: "Would need to be careful the export settings don’t change e.g pg/µl>ng/µl" (Ben F). Which export settings are being referred to here? (e.g. Export to Traction or Tabluea). Currently in Tractions implementation of QC Upload, the pre defined unit of a QC Assay Type is persisted to the MLWH. We would need to ensure each possible CSV heading, is predefined in Traction config with it's singular unit.

harrietc52 commented 1 year ago

Benefits

The benefits to this work will depend on how far we take it. If we get the femto traces in the MLWH then I’ll be able to perform the autoQC in tableau. We can use this to this to track the quality of the libraries over time, see how it correlates to prep type, input, clade… Currently it’s very difficult to spot trends in the quality as we just have thousands of links to .pdf files which show the femto traces.

If we decide to get LIMS to run autoQC then it could speed up the process significantly. In this scenario we would upload the quant data into the MLWH, the user would then scan their plate into LIMS and would be able to download a .csv file to enable the robots to re-array the samples according what needs to be done to them next (e.g. SPK3, ULI, or fail). If this were to be automated it would speed up the process as the user wouldn’t have to interpret the quality of the traces, remove any subjectivity in interpretation and remove a potential source of error when re-arraying samples.

The third option is just to run the autoQC in excel and then paste the outcome into the current LR googlesheet. Whilst this is feasible I’d prefer it if long read didn’t have to juggle another spreadsheet.

harrietc52 commented 1 year ago

Femto CSV

Headers

Well Sample ID Range pg/uL % Total pmole/L Avg. Size %CV Size Threshold (b.p.) GQN
Well Sample ID Range ng/uL % Total nmole/L Avg. Size %CV Size Threshold (b.p.) GQN

Well: The well column (A) is the well the sample is in, on the femto assay plate. (Not the well the sample was in during the extraction/library prep). string

Sample ID: The Sample ID column (B) is currently whatever string the user typed into the femto when they set the run up. string

CSV Examples

2022 02 22 19H 00M Smear Analysis Result.csv 2022 02 22 17H 32M Smear Analysis Result.csv 2022 02 15 20H 21M Smear Analysis Result.csv 2022 02 15 15H 58M Smear Analysis Result.csv 2022 02 15 18H 54M Smear Analysis Result.csv

harrietc52 commented 1 year ago

Implementation

QcAssayType contain key, label, used_by and unit, which could represent the QC fields (Column C-J)?

However, QcResultsUploadFactory is quite specific to handling QcDecisions, QcResults and QcDecisionResults for Long Read and TOL. It validates the decision headers are present, which do not exist in the Femto CSV's.

QC Decision and Decision Results would not be needed here. Only QcResults. But these still require the fields 'Tissue Tube ID' and 'Sanger sample ID'.

Could reuse create_qc_results in app/models/qc_results_upload_factory.rb, and have QcResults exist by themselves. Would this break anything? The Message to MLWH might need to change, as this would expect there to be QcDecisions.

SujitDey2022 commented 1 year ago

Hi @harrietc52, can you help me with the business priority of this requirement and if I can categorize this as a 'Must/Should/Good to have' requirement? Also whether this is a Technical Debt, Security or Housekeeping type of requirement. Thanks,