sanger / limber

A config-driven LIMS built on Sequencescape, primarily for running library preparation pipelines in the laboratory
MIT License
3 stars 8 forks source link

Y24-313 - PBMC donor pooling - use requested number of samples per pool #1909

Open KatyTaylor opened 1 week ago

KatyTaylor commented 1 week ago

User story As a user of the scRNA Core pipeline, I would like the automatic allocation of samples to pools to behave differently.

Currently, a config table is used that maps number of samples to number of pools (see interim_scrna_core_donor_pooling.yml and comment here). Instead, we would like to specify a number of samples per pool, which could vary from pool to pool - this information will come in at the point of submission (implemented in https://github.com/sanger/sequencescape/issues/4337).

This is a first step in supporting a more flexible pooling strategy, where the customer can specify the number of cells per pool and the number of cells per chip well. This story is to support a pre-MVP so that something can be in place for when R&D do a test run.

Who are the primary contacts for this story Abby, Katy

Who is the nominated tester for UAT Abby

Acceptance criteria To be considered successful the solution must allow:

Dependencies The number of samples per pool will come in as part of the submission:

It's not blocking as, for this story, we can just assume it's stored against the request metadata.

Additional context This is for the 'pre-MVP' version, to be used for R&D's end to end test. This will have to be re-jigged for MVP to allow a variable number of samples per pool, and potentially to add logic to achieve an even distribution of concentrations if possible.

KatyTaylor commented 3 days ago

Have asked Lesley the following in Slack:

We've got a couple of assumptions we'd like to make to regarding the pooling strategy - just for the 'pre-MVP' version for your end to end test:

  1. The total number of samples will be exactly divisible by the requested number of samples per pool e.g. 80 samples with 10 samples per pool (not 81 / 10 or 80/9)
  2. There will be an even number of pools - this is because we have a known issue where, for an odd number of pools, we get an uneven distribution of samples across the pools. I think you are going to test a) 8 pools of 10, and b) 8 pools of 24.
  3. There will be one study and one cost code per chip, and no 'duplicate' samples that have the same donor Is it OK to make all these assumptions for the 'bare bones' version?
KatyTaylor commented 3 days ago

Looked at this with Andrew. Storing the 'requested number of cells per pool' on request metadata is not straightforward because we'd have to add a new column. Thought of the following options:

dasunpubudumal commented 3 days ago

Looked at this with Andrew. Storing the 'requested number of cells per pool' on request metadata is not straightforward because we'd have to add a new column. Thought of the following options:

  • New fields on Request Metadata
  • Retrieve directly from request_options hash on Order (think there is one Order per Study so it's ok at this level)
  • PolyMetadata against Order or Request
  • Somewhere else?

oh-oh :D I have done a bit of work on https://github.com/sanger/sequencescape/pull/4346/files. Is it inconsistent with what you guys have discussed? I thought we were going to add a new column.