minoda-lab / universc

UniverSC: a flexible cross-platform single-cell data processing pipeline
https://genomec.gsc.riken.jp/gerg/UniverSC/UniverSC_app_release/
GNU General Public License v3.0
42 stars 7 forks source link

support for Singleron GEXSCOPE v2 technology #17

Open alefrol638 opened 1 year ago

alefrol638 commented 1 year ago

Dear minoda-lab,

Could you please add support for the Singleron GEXSCOPE v2 technology (https://github.com/singleron-RD/CeleScope). The barcode and UMI are build in the following way: C9L16C9L16C9L1U12T18. C stands for cell barcode. L is the linker. U is the UMI and T is the poly-T sequence.

Thank you!

TomKellyGenetics commented 7 months ago

Thank you for sharing the specifications, I have a new release planned to add additional features so it would be possible to integrate this. It appears to have similarities to other technologies already supported so it should not be too much trouble.

To clarify, this barcode / linker / UMI is on the read1 R1 file?

TomKellyGenetics commented 7 months ago

I'll note that the scripts in the linked repo document alignment of RNA to read 2 and barcodes in read 1.

They also provide details on chemistry and barcode whitelists for each version: https://github.com/singleron-RD/CeleScope/blob/master/doc/chemistry.md

Due to the linker sequences above, custom barcode inputs are not currently supported (but can be done).

TomKellyGenetics commented 7 months ago

I've had a quick look at this technology. The barcode specifications provided and whitelist sequences in the MIT Licensed CellScope directory should be sufficient to configure UniverSC with a minor update. The Linker sequences can be removed with a similar subroutine as used already for SureCell and PIP-Seq.

Linkers for versions 2.0.0 and 2.1.0 appear to be identical. The linkers for versions 2.0.1 and 2.1.1 have an additional C base on the ends. Version 3.0.1 is more complex with 4 linkers and 4 barcode whitelists but they are each the same length.

Implementing version 2.0.0 as requested should not take much time but I'll consider supporting multiplex versions at the same time due to their similarities. This will take a little more time to ensure the documentation and input arguments allow a user to select the correct parameters. I'd expect to have this done within a few weeks (it should only take a few hours if I can schedule it).

TomKellyGenetics commented 7 months ago

To clarify this chemistry is not supported in v1.2.7 but is planned for an upcoming release.

alefrol638 commented 7 months ago

All right, I will check it out, once you have done the next release. Thank you very much!

TomKellyGenetics commented 7 months ago

Great, I'll update this issue when it is ready so it should send you a notification when you can try it out. Thanks for sharing the design of the barcodes, it really saves us a lot of time to add new technologies. If I can manage it, I'll add support for multiple versions at the same time.

TomKellyGenetics commented 7 months ago

I've confirmed that versions 2.0.0-2.1.1 use the same 96 well barcode whitelist via md5sum. Supporting version 3.0.1 would be more difficult as 96, 192, and 384 barcode lists are provided. Adding version 2 as requested seems feasible with a minor update however.

TomKellyGenetics commented 7 months ago

I've added GEXSCOPE barcodes and documentation into the development version on my personal account. I'll merge it into organisation when is ready to use. Currently barcode combinations and linker trimming are not implemented for this technology.

TomKellyGenetics commented 7 months ago

@alefrol638 The development branch now supports this feature. The following versions are supported:

For GEXSCOPE v1 all combinations are permitted as CeleScope does not use a whitelist.

For GEXSCOPE v2 all versions have the same barcode whitelist: bear in mind that v2.0.0 and v2.1.0 have the same linkers but v2.0.1 and v2.1.1 have an additional C after the last barcode. v2.0 and v2.1 have different UMI lengths.

For GEXSCOPE v3 all 4 combinations of barcode whitelists and linkers are supported assuming no overlap between them.

-  GEXSCOPE version 1.0.0 (12 bp barcode, 8 bp UMI): gexscope-v1.0.0
-  GEXSCOPE version 2.0.0 (24 bp barcode, 8 bp UMI): gexscope-v2.0.0
-  GEXSCOPE version 2.0.1 (24 bp barcode, 8 bp UMI): gexscope-v2.0.1
-  GEXSCOPE version 2.1.0 (24 bp barcode, 12 bp UMI): gexscope-v2.0.0
-  GEXSCOPE version 2.1.1 (24 bp barcode, 12 bp UMI): gexscope-v2.1.1
-  GEXSCOPE version 2.2.1 (24 bp barcode, 12 bp UMI): gexscope-v2.2.1
-  GEXSCOPE version 3.0.1 (27 bp barcode, 12 bp UMI): gexscope-v3.0.1 

https://github.com/minoda-lab/universc/tree/dev

TomKellyGenetics commented 7 months ago

I plan to merge this into the next release when BD Rhapsody issues have been investigated.