milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
316 stars 78 forks source link

mixcr preset for Fluidigm C1 platform #1105

Closed malonzm1 closed 10 months ago

malonzm1 commented 1 year ago

Hi,

What preset should be used with Fluidigm C1 platform?

Thanks and good day.

Alex-Davydov commented 1 year ago

Hi, It depends on the type of chemistry do you use in C1 platform. Could you explain in more details?

malonzm1 commented 1 year ago

Hi,

Thanks. Refer to page 3 of https://fluidigm.my.salesforce.com/sfc/p/#700000009DAw/a/4u0000019joV/BB2ZsZ6tu6Yv9jzLBHstJMnCfDNWGqPjG5ZiMhI01i0. Does this help?

mizraelson commented 10 months ago

Hi, According to the protocol, there are two CELL barcodes: one in the R1 and another in the I1. I would recommend obtaining non-demultiplexed data (i.e., not split by Nextera indices) and trying the following below. Note that you will also need the index read I1 fastq file.

mixcr analyze generic-single-cell-gex \
--species hsa \
--tag-pattern "^(CELLROW:N{6})\^(R2:*)\^(CELLCOLUMN:N{10})" \
--set-whitelist CELLROW=file:fluidigmC1-CELLROW-whitelist.txt \
input_R1.fastq.gz \
input_R2.fastq.gz \
input_I1.fastq.gz \
output

I have attached the whitelist for the CELLROW barcode list, which I obtained from the manufacturer's demultiplex script. You can also add a whitelist for the CELLCOLUMN barcode using a similar list of Nextera indices used for your samples.

Alternatively, if your data has already been demultiplexed and you cannot obtain the original non-demultiplexed files, you can use the command below:

mixcr analyze generic-single-cell-gex \
--species hsa \
--tag-pattern "^(CELLROW:N{6})\^(R2:*)" \
--set-whitelist CELLROW=file:fluidigmC1-CELLROW-whitelist.txt \
{{CELLCOLUMN:a}}_R1.fastq.gz \
{{CELLCOLUMN:a}}_R2.fastq.gz \
output

{{CELLCOLUMN:a}} should be placed in the filename pattern where it determines the column (Nextera index).

It's worth noting that, since it's a 3' RNA-seq protocol, your reads most likely will not cover the CDR3 region. This protocol wasn't designed for immune repertoire analysis.

fluidigmC1-CELLROW-whitelist.txt

malonzm1 commented 10 months ago

Many thanks!

malonzm1 commented 10 months ago

Hi,

Sorry if this has an obvious answer. Re

mixcr analyze generic-single-cell-gex \
--species hsa \
--tag-pattern "^(CELLROW:N{6})\^(R2:*)" \
--set-whitelist CELLROW=file:fluidigmC1-CELLROW-whitelist.txt \
{{CELLCOLUMN:a}}_R1.fastq.gz \
{{CELLCOLUMN:a}}_R2.fastq.gz \
output

why is R1.fastq.gz (containing barcode info) still necessary when the cells have already been demultiplexed?

mizraelson commented 10 months ago

There are two cell barcodes. The cells can be demultiplexed using the CELLCOLUMN (Nextera index), but not by the CELLROW, which is located inside R1. As a result, every pair of files still contains multiple cells.