how to do demultiplexing

paulranum11 / SPLiT-Seq_demultiplexing

An unofficial demultiplexing strategy for SPLiT-seq RNA-Seq data

MIT License

26 stars 8 forks source link

how to do demultiplexing #10

Closed BenxiaHu closed 3 years ago

paulranum11 commented 3 years ago

Hi biohubx,

Thanks for your interest in our demultiplexing tool. Our software was written to work with the originally published configuration of SPLiT-Seq (DOI: 10.1126/science.aam8999). This original configuration contains only 3 SPLiT-Seq barcodes. 4 and 5 barcode configurations will not work with our current version. However we would be happy to take a look at expanding our tool to support the additional barcodes. The best way to achieve this would be for you to send me a small subset of your Forward and Reverse fastq files. (The first 5000 lines should be sufficient). Please also let me know if there are any other departures from the published protocol as these may cause additional incompatibilities. I hope this is helpful!

Paul

paulranum11 commented 3 years ago

Hi biohubx,

I think the confusion is that barcodes 1-3 are the "SPLiT-Seq" barcodes while barcode 4 "Is really an illumina index".

Before beginning demultiplexing with the SPLiT_Seq-Demultiplexing you need to run BCLtoFastq to demultiplex your separately barcoded illumina datasets. This process will remove the "4th barcode (illumina index)". You will be left with fastq files that have three split-seq barcodes (1-3) in the image you attached above.

Did you use the SPLiT-Seq published barcode sequence? If so you don't need to alter the provided files. If you used custom barcode sequences you can provide them as described here: https://github.com/paulranum11/SPLiT-Seq_demultiplexing/issues/3

paulranum11 commented 3 years ago

In response to "3:" yes splitseqdemultiplex_0.2.1.sh does generate the genes (rows) cells (columns) expression matrix.

paulranum11 commented 3 years ago

You said that you have 10 SPLiT-Seq barcoded datasets on hand. If you did not generate them who did? Ask this person what barcodes were used. Or read the paper associated with these datasets. It would be unusual for someone to use different barcode sequences but it is possible.

If you are unsure how many SPLiT-Seq barcodes are in your fastq files open the first 100 reads of your read2.fastq file with a text editor and look for similar and dissimilar sequences between the reads. Each barcode will be visible as an 8bp variable sequence region. There will also a be a UMI which will be a 10bp variable region (is this the 5th barcode you are referring to?)

You should use "--version fast" or "splitseqdemultiplex_0.2.2.py" which is located in the "Python_only_tool" directory.

paulranum11 commented 3 years ago

With each version of SPLiT-Seq-Demultiplexing there is a small example dataset. I suggest you start by working with this small example to confirm that you can get the software running. Once that is working you should compare it to your dataset. If the construction of your dataset is similar to the small example then you can proceed to running your dataset.