sanger / sequencescape

Web based LIMS
MIT License
87 stars 33 forks source link

DPL-554 SS New tag sets for Chromium Library Plate Manifest template #3696

Closed SujitDey2022 closed 3 months ago

SujitDey2022 commented 1 year ago

Title: Action to mitigate Tag issues will reduce the number of Sequencing Runs placed on hold, reducing overall sample TAT by days and reduce resources needed for corrective actions (correctly undertaken by SSRs, Data QC and NPG)

Description: There are two types of Manifests used by DNAP to upload library data into LIMS. Currently, the version used by the Cellgen Faculty contains the tags which must be added as the actual sequence (eg i7 CACCGCACCA and i5 GACTGTCAAT) Example can be downloaded here: https://sequencescape.psd.sanger.ac.uk/sdb/sample_manifests/20146

If Cellgen were able to use the Chromium Library Plate Manifest version which is tag plate based, then typos and incorrect reverse complementation would be mitigated. This would result in the correct tags being uploaded to lims, removing the need for the QC team putting runs on hold, SSRs contacting customers for the corrected tag sequences, re-uploading to SS and then requesting NPG to re-run the flowcell analysis.

New tag plates can be added to the Chromium Library Plate Manifest template in SS (listed in acceptance criteria)

The data for these can sets has kindly been provided by Conor Parks in CellGen (look for the tabs that match the names in the list in acceptance criteria): https://docs.google.com/spreadsheets/d/1KB8vs2vhAkUkAooe9_bjaOyV1CuFDLU0vuzhWslahJc/edit#gid=331796148 The data is correct as of 17/4/24

Primary contacts for this story: Richard C - SSR Conor Park - CellGen Contact

Who is the nominated tester for UAT Richard C, Conor Parks

Acceptance criteria To be considered successful the solution must allow:

Additional Information for developers Tag sets are called 'tag groups' in Sequencescape. Data will need to be inserted into the tag_group and tag tables. It can actually be done in the Sequencescape UI here - https://training.sequencescape.psd.sanger.ac.uk/tag_groups/new

For the single index set above:

For the dual index sets above:

N.B. Many of the tag sets are already in Sequencescape - they would need to be renamed. In the event of renaming, stakeholders should be informed and code checked for references. See https://github.com/sanger/sequencescape/issues/3696#issuecomment-2127145714

SujitDey2022 commented 1 year ago

@Skrich1999 please can you confirm if this story is still relevant and needs to be planned into the development backlog? Thanks,

SujitDey2022 commented 7 months ago

@Skrich1999 to review and update the user story.

Skrich1999 commented 6 months ago

@SujitDey2022 I have spoken with the SUs and the data is current. I have add Conor Parks as a CellGen contact and added in the google link to the data set. This story can be progressed next week, Thanks.

KatyTaylor commented 6 months ago

The two library plate manifest types mentioned: Screenshot 2024-04-26 at 08 52 38

Empty 'Library Plate' manifest, showing 'i7 TAG SEQUENCE' and 'i5 TAG SEQUENCE' fields: Screenshot 2024-04-26 at 08 53 49

Empty 'Chromium Library Plate' manifest, showing 'CHROMIUM TAG GROUP' and 'CHROMIUM TAG WELL' fields: Screenshot 2024-04-26 at 08 54 47

Tag groups and tags data model: Screenshot 2024-04-26 at 09 20 36

KatyTaylor commented 6 months ago

I think all that's needed here is to insert some data into the 'tag group' and 'tag' tables in Sequencescape.

Four tag groups are needed - names listed out in the description, and in the names of the separate tabs in the Google Sheet. The adapter_type_id should link to the record in the tag_group_adapter_types table called 'Chromium'.

Under each tag group, many 'tags' should be created. The 'oligo' field should contain the DNA sequence (e.g. 'GAGGAGAGAG') found in the spreadsheet.

The map_id field represents the order of the tag in the group, or where it is on the tag plates - I'm currently not sure how to work this out from the spreadsheet - needs further discussion. I think from looking at the code that each well on the tag plate contains 4 tags - also not sure how this relates to the spreadsheet.

andrewsparkes commented 5 months ago

It's not clear how the spreadsheet relates to the tag groups to me either. The 'Single' group appears to have 4 columns of oligo sequences. The Dual ones have 2 variants a and b of the combinations of two oligos. The existing Chromium tag groups in Production are generally 96 tags with map ids 1-96. The tagging screen in Limber allows selection of 2 tag groups (i5 and i7). An individual well can therefore have 1 or 2 tags in it. There are also tag_layout_templates, these are combinations of up to 2 tag groups along with the rules to lay them out on the plate (by map ids) e.g. by columns and wells of plate So how you can get to 4 tags per well I don't understand. Or how you'd have only 4 tag groups from those spreadsheet tabs. Sounds like a chat with Conor is the first step.

SujitDey2022 commented 5 months ago

@andrewsparkes @KatyTaylor, @Skrich1999 has reached out to Conor and we should be hearing back from him, let me follow up on this and get back.

Skrich1999 commented 5 months ago

Good morning, @andrewsparkes and @KatyTaylor! I've noticed that Conor is out of office until the 13th. Would it be a good idea to arrange a meeting with the relevant faculty members to address and resolve the questions that have been raised?

Skrich1999 commented 5 months ago

Hello @andrewsparkes and @KatyTaylor, I've just had a discussion with the Faculty group. They suggest it's best to hold off until Conor returns on the 13th. Additionally, I've delved into understanding set A and set B. It seems this pertains more to NPG than SS, but I could be mistaken.

From what I gather, only set A should be utilized. Unfortunately, I'm unable to modify the Google document to gray out set B (which is essentially the reverse of set A).

This reversal has introduced a new acceptance criterion: Tags must function effectively across all short read platforms and be appropriately reverse complemented by NPG when necessary.

I hope this clarifies things.

Skrich1999 commented 5 months ago

Tag sets A and B (for use when 10x dont and a LIMS system able to reverse competent tags (like NPG who automatically do this for us) : https://kb.10xgenomics.com/hc/en-us/articles/360056364852-Should-I-select-Workflow-A-or-Workflow-B-for-the-i5-index-sequence

andrewsparkes commented 5 months ago

On hold till Conner returns on 13th.

andrewsparkes commented 5 months ago

In the Sequencescape code the manifests use 2 specialised columns:

(see:

config/sample_manifest_excel/manifest_types.yml
config/sample_manifest_excel/columns.yml
app/sequencescape_excel/sequencescape_excel/specialised_field/chromium_tag_group.rb
app/sequencescape_excel/sequencescape_excel/specialised_field/chromium_tag_well.rb

)

The chomium_tag_well class code takes the well location entered in the manifest (e.g. A1) and translates that to fetch 4 sequential tags from the corresponding chromium_tag_group (e.g. map_id indexes 1,2,3 and 4) to give the 4 tags per well.

So that would suggest a single tag group is made that contains 4 x the usual number of oligo sequences (i.e. 384 for use with 96-well tag plates).

So we have to check, for each of the 4 tabs referenced in Conors file, whether those tag groups are already created in the Sequencescape database or we need to make them. And if we make them we have to be VERY careful to get the map indexes and oligo sequences correct.

I think, for the SINGLE tab where we have 4 columns of oligos per row, we likely need a 384 oligo tag group, to be used in sets of 4 map_ids in a 4:1 relationship with the chromium_tag_well in the manifest.

Whereas for the 3 x DUAL tabs, these are likely standard 96 oligo tag groups with a 1:1 relationship to chromium_tag_well in the manifest.

KatyTaylor commented 5 months ago

Looks like some of these are actually in Sequencescape already, under different names. I haven't carefully checked every single tag, just a selection.

Tag groups can be looked up by name here - https://training.sequencescape.psd.sanger.ac.uk/tag_groups

Name in story Name in SS
Single Index Kit N, Set A 1000212 Chromium single cell
Dual Index Kit TT, Set A 1000215 (workflow A) 10X_Plate TT Set A i7 and 10X_Plate TT Set A i5
Dual Index Kit TT, Set A 1000215 (workflow B) Not present
Dual Index Kit TN, Set A 1000250 (workflow A) REDUNDANT - Dual Index Kit TN Set A 10Xgenomics i5 (a), Dual Index Kit TN Set A 10Xgenomics i5_a Column Wise (same oligos but different order, column-wise), Dual Index TN Set A 10Xgenomics ColumnWise also related
Dual Index Kit TN, Set A 1000250 (workflow B) Not present
Dual Index Kit TS, Set A 1000251 (workflow A) FFPEvisium_i5, FFPEvisium_i7 (same oligos but different order, column-wise)
Dual Index Kit TS, Set A 1000251 (workflow B) Not present
KatyTaylor commented 5 months ago

Notes for developers

(apologies for the wordy brain dump - I might clean it up later!)

Single index tag groups:

In the spreadsheet (and I assume reality), this is a 96-well plate with 4 oligos listed per well. In SS, this is represented by one single tag group with 384 tags, all with unique 'map ids'. In the manifest, 'Chromium tag well' column, the SSR can fill out the correct well description e.g. 'A1'. When uploaded, SS will pull the relevant tags out of the tag group. e.g. Well in spreadsheet / reality Translates to tag 'map id'
A1 [1, 2, 3, 4]
B1 [5, 6, 7, 8]
C1 [9, 10, 11, 12]

...etc.

See chromium_tag_well.rb for how this is achieved.

Looks like this tag group is already in the db (see previous comment), it might just need to be renamed for clarity.

Dual index tag groups:

In the spreadsheet, there are 3 columns for each well. An i7 oligo, an i5 oligo for workflow A and an i5 oligo for workflow B. In SS, this will be represented as 3 tag groups - for i7, i5 (workflow A) and i5 (workflow B). The existing tag groups 10X_Plate TT Set A i7 and 10X_Plate TT Set A i5 are examples of the i7 and i5 (workflow A) columns respectively.

Tag groups are used in (at least) two places. These i7 and i5 tag groups can be selected on the Limber tagging screen - when a tag plate is actually being used in a pipeline. As described in this story, tags are also included in library manifests, for when customers submit ready-tagged libraries to SeqOps and have to tell them what tags were used in each well. The library plate manifest has two columns, 'i7' and 'i5', where the customer can specify the oligo by typing out the sequence. The Chromium library plate manifest allows the customer to specify tag group and tag plate well instead, so as to avoid typos. It looks to me like there is no manifest that currently supports specifying dual index tag groups. The Chromium library plate manifest allows the user to select a tag group that is not of the 4-per-well type. I tested selecting '10X_Plate TT Set A i7' and it broke on upload - because it tried to allocate 4 per well and then ran out of tags to allocate when it had used up all 96.

So for this story, we can insert the relevant tag groups into the database, but in order to be able to use the dual index tag groups in the manifests, we'd have to make new columns to support this - you'd need two drop downs 'i7 tag group' and 'i5 tag group'.

Ideally, we would also amend the existing Chromium library plate manifest to only display appropriate tag groups in the drop down - ones that have 4 tags per well.

Questions:

--> In SS, the tags in a tag group have an 'index' (see https://training.sequencescape.psd.sanger.ac.uk/tag_groups/251, for instance). The index does not really represent a well, just the order of the tags in the tag group (it does NOT link to the maps table in SS). Normal 96-well tag groups like the linked one expect their tags to be in column order (A1, B1 etc.) - I think - hence the order in the list in Sequencescape (linked above) is different to the order in the 10x spreadsheet (linked in the story description). Manifests that allow users to specify tag group and well must make an assumption about the mapping between 'index' and well location on the tag plate. This mapping is different for different tag groups - for a different example, see the single index tag groups described above.

andrewsparkes commented 4 months ago

re: Ideally, we would also amend the existing Chromium library plate manifest to only display appropriate tag groups in the drop down - ones that have 4 tags per well.

Background: The manifest has a column called chromium_tag_group. This references a range called chromium_tag_groups. This selects tag groups based on a scope called chromium. This filters for tag_groups that have an adapter_type of 'chromium' (see scope in app/models/tag_group.rb).
So the manifest is trying to limit what's in that dropdown to chromium tag groups.

That doesn't seem to be specific enough given the example above for '10X_Plate TT Set A i7' (which does have the chromium adapter_type but has 96 tags rather than 384).

yoldas commented 4 months ago

kt17, as28 and ay6 have talked about transforming data from the Google Doc and creating default records in Sequencescape.

yoldas commented 4 months ago

Intermediate files for DPL-554 is at the commit f04b929aef53509efa41a7f5156efdf5d191840a

yoldas commented 3 months ago

Dual index tag groups have been now moved to the WIP flagged file config/default_records/tag_groups/004_chromium_dual_index.wip.yml