sanger / crawler

Store sample data from Lighthouse labs
MIT License
4 stars 3 forks source link

GPL-642 Parsing and persisting CT values from AP #108

Closed rl15 closed 4 years ago

rl15 commented 4 years ago

User story GPL-642 | As senior stakeholders (Jeff B & John S) we would like to persist CT values for Alderney Park to later filter out these samples from sentinel to increase sequencing success rates

Who are the primary contacts for this story Sonia G Cristina A Jeff B John S (PAM) Rob A Emma G

Candidate Acceptance criteria

Jeff B wrote (Wednesday, 9 September 2020 at 10:59)

To answer your questions:

For the Cq columns, the valid range is a float between 0 and 100, can be null For the Result columns, the valid values are Positive and Negative, can be null For the Target columns, this is a string with currently one of only four values {‘ORF1ab’, ‘N gene’, ‘S gene’, ‘MS2’} . It may be sensible to make these the only valid values, and then if the labs change Targets without telling us it will trigger an exception. Though it will be important to be able to add new Targets in the future.

Result (in col E) = ‘Positive’ but all CHn-Result = ‘Negative’ CT value less than 0 or greater than 100 CHn_Target value not in set {‘ORF1ab’, ‘N gene’, ‘S gene’, ‘MS2’} ? NB This set will have further elements in future (suggest is lightweight to add new elements

File standard changes

Current AP file format (cols A-G) Root Sample ID
Viral Prep ID
RNA ID
RNA-PCR ID
Result
Date Tested Lab ID

extended by new file headers

(cols H-S) CH1-Target {‘ORF1ab’} CH1-Result {‘Positive’, ‘Negative’, null} CH1-Cq {(0..40), null} CH2-Target {‘N gene’} CH2-Result{‘Positive’, ‘Negative’, null} CH2-Cq {(0..40), null} CH3-Target {‘S gene’} CH3-Result {‘Positive’, ‘Negative’, null} CH3-Cq {(0..40), null} CH4-Target {‘MS2’} CH4-Result {‘Positive’, ‘Negative’, null} CH4-Cq {(0..40), null}

See example file below (zipped)

Additional context Business impact will be in at least three parts

  1. Parsing the files for AP and persisting the CT values (this backlog item)
  2. Repeating for one or more of the other LH sites (Agreed to do AP first in project meeting on 23rd Sept)
  3. Using this data to filter out any results = positive from '+ves on site LH report' available here: http://lighthouse-ui.psd.sanger.ac.uk/ (maybe this backlog item or another -to discuss esp reporting)

Team discussion

Discussed post stand up on the 24th with reference to this diagram: https://app.lucidchart.com/invitations/accept/6499b3a0-7a14-4192-b5db-1cd35b84646f

Agreed best approach is to, parse and persist first (in MLWH) then filter as a separate backlog item (GPL-659)

Even if we are treating these like negatives for Cherry picking, they should not be seen as negatives from a reporting context

We would expect that the API call from Beckman robotics of +ve wells (GPL-567 & 568), that high CT values will be filtered out from automated picks.

rl15 commented 4 years ago

AP_sanger_report_200907_1531.csv.zip

rl15 commented 4 years ago

For completeness we also received example file from Glasgow. The file is not to the same standard as AP & MK

Has table far right file - possibly created manually Missing critically RNA plate barcode (& Viral Prep ID) New value in Channel results 'Inconclusive'

Rich L wrote (Tuesday, 29 September 2020 at 16:42)

What is the rule for this;

Filter out sample from any pick list If result (col E) is positive (Y/N)? Log as error (Y/N)? Insert into database (& MLWH) (Y/N)?

Looking at example file sent from GLAS received yesterday afternoon noted have new value for a channel. Requested clarification in how this value should be managed

Jeff B wrote (Wednesday, 30 September 2020 at 07:48)

Inconclusive in Channel results should be treated the same as Negative. That is, across channels 1- 3, if any channel Result is Positive the overall Result should be positive, if all channels are Negative, Inconclusive, or Void, the Result should be Negative. Anything inconsistent with that rule should be flagged as an error, and removed from the picklist.

rl15 commented 4 years ago

Alan K wrote (Wednesday, 30 September 2020 at 09:26)

... I have also attached the example file sent over from MK last night. They have made a change to their development lims environment to include the cq values in the daily sanger report. This is ready to go once we are able to “change import routines to consume the extra fields”.

MK_sanger_report_200929_2311.csv.zip

Rich L wrote (Wednesday, 30 September 2020 at 11:51)

Note MK have introduced 2 new values to the Results (ColE) in this example file

Current valid values are ‘Positive’, ‘Negative’, ‘Void’, null To new values in the example file ‘Detected’ & ‘Not Detected’

Assume neither is a Positive result? Persist in database Don’t log as error.

Waiting for answer

andrewsparkes commented 4 years ago

N.B. That MK file contains the headers for the CT channel columns but no actual data in any of the rows. The headers match the AP example at least, which is good.