milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
317 stars 78 forks source link

Mapping TCR info to individual cells (scRNA-seq dataset) #1326

Closed malonzm1 closed 10 months ago

malonzm1 commented 10 months ago

Hi,

I would like to map the mixcr TCR output to individual cells (using scRNA-seq data). That is, determine which cells contribute to the identified TCR repertoire. Is this possible?

Thanks and good day.

mizraelson commented 10 months ago

Hi. Do you mean mapping bulk TCR repertoire sequencing (assembled clones without cell barcodes) to scRNA-seq?

malonzm1 commented 10 months ago

Thanks. I generated TCR repertoire using mixcr with scRNA-seq data. I would like to know which cells actually contain the clonotypes. Does that make sense?

mizraelson commented 10 months ago

What MiXCR command did you use?

malonzm1 commented 10 months ago

mixcr analyze rna-seq -s hsa --rna 1.fastq.gz 2.fastq.gz outdir and mixcr analyze rna-seq -s hsa --rna {{CELL0ROW:a}}_1.fastq.gz {{CELL0ROW:a}}_2.fastq.gz outdir but the data is for scRNA-seq (I run into errors when using the 10x and Smart-seq2 presets).

mizraelson commented 10 months ago

This preset is designed for a bulk RNA seq data.what protocol did you use for scRNA seq? Was is it 10x? What error did you get?

malonzm1 commented 10 months ago

I used 10x and Smart-seq2. For 10x, there's no preset for 3'.

mizraelson commented 10 months ago

What chemistry did you use for 10x v2 or v3. Both are available for 3' and they differ in cell barcode length. Also, did you encounter any errors using 'smart-seq2'? Can you share the text of the error?

malonzm1 commented 10 months ago

What are the presets for 10x v2 or v3 for 3'? Can I map the TCR clonotypes to the individual cells?

mizraelson commented 10 months ago

We originally didn't include the 3' preset because the clonotype yield from this type of data is significantly low.

If you're using 3'10x V2, the barcodes are identical to 5', so you can use the following preset:

mixcr analyze 10x-sc-5gex \
--species hsa \
input_R1.fastq.gz \
input_R2.fastq.gz \
output

For 3'10x V3, use the command below:

mixcr analyze 10x-sc-5gex \
--species hsa \
-MrefineTagsAndSort.whitelists.CELL=builtin:3M-febrary-2018 \
--tag-pattern "^(CELL:N{16})(UMI:N{12})\^(R2:*)" \
input_R1.fastq.gz \
input_R2.fastq.gz \
output

In the resulting output tables, there will be a column showing the CELL barcoded sequence. You can utilize this to map the cells to their respective expression data.

Do you require assistance with smart-seq2 data?

malonzm1 commented 10 months ago

Thanks. In the above examples, does input_R2.fastq.gz contain the barcode+UMI sequence?

mizraelson commented 10 months ago

Yes, everything is according to 10x protocol.

malonzm1 commented 10 months ago

Thanks. What if there are two coding fastq files?

mizraelson commented 10 months ago

Sorry, I mislead you, in 10x CELL and UMI barcodes are in R1 and the "coding" sequence is in R2.

Do you mean that you have a longer R1 which also covers some sequence in addition to barcodes?

malonzm1 commented 10 months ago

Thanks. Yes.

mizraelson commented 10 months ago

Then you can use --tag-pattern "^(CELL:N{16})(UMI:N{10})(R1:*)\^(R2:*)" and --tag-pattern "^(CELL:N{16})(UMI:N{12})(R1:*)\^(R2:*)" for V2 and V3 respectively.

malonzm1 commented 10 months ago

Thanks! I would also like to consult re smart-seq2 but I have to review my data first.

malonzm1 commented 10 months ago

Then you can use --tag-pattern "^(CELL:N{16})(UMI:N{10})(R1:*)\^(R2:*)" and --tag-pattern "^(CELL:N{16})(UMI:N{12})(R1:*)\^(R2:*)" for V2 and V3 respectively.

I read here https://www.biostars.org/p/9529864/#9572200 that in cases where R1 is barcode+UMI, the rest of the read can be effectively ignored. What tag pattern should I use in that case?

malonzm1 commented 10 months ago

Also, do you have a preset for Smart-seq2 cel-seq?

malonzm1 commented 10 months ago

We originally didn't include the 3' preset because the clonotype yield from this type of data is significantly low.

If you're using 3'10x V2, the barcodes are identical to 5', so you can use the following preset:

mixcr analyze 10x-sc-5gex \
--species hsa \
input_R1.fastq.gz \
input_R2.fastq.gz \
output

For 3'10x V3, use the command below:

mixcr analyze 10x-sc-5gex \
--species hsa \
-MrefineTagsAndSort.whitelists.CELL=builtin:3M-febrary-2018 \
--tag-pattern "^(CELL:N{16})(UMI:N{12})\^(R2:*)" \
input_R1.fastq.gz \
input_R2.fastq.gz \
output

In the resulting output tables, there will be a column showing the CELL barcoded sequence. You can utilize this to map the cells to their respective expression data.

Do you require assistance with smart-seq2 data?

do you also have a preset for 10x 3' v1?

mizraelson commented 10 months ago

Hi, 1) Preset for Smartseq2 is smart-seq2-vdj. You can read about it here. 2) Use the original commands provided above (where the pattern doesn't have R1) to skip the rest of R1 read. 3) I couldn't find any info regarding 3' v1 on their website. If you can share the barcode structure and the whitelist I can help you with that.

malonzm1 commented 10 months ago
  1. I couldn't find any info regarding 3' v1 on their website. If you can share the barcode structure and the whitelist I can help you with that.

Thanks. https://www.biostars.org/p/462568/

10x v1 Whitelist, 737K-april-2014_rc.txt CB length, 14 UMI start, 15 UMI length, 10 (courtesy ATpoint)

malonzm1 commented 10 months ago

Hi,

How can I run mixcr with SureCell sequences?

SureCell (18 bp barcode, 8 bp UMI): surecell, ddseq, biorad

Thanks and good day.

mizraelson commented 10 months ago

Hi, are you certain you're using 3' v1? To my knowledge, that kit version was discontinued a long time ago.

From the link you provided, the tag pattern appears as follows: --tag-pattern "^(CELL:N{14})(UMI:N{10})\^(R2:*)"

The corresponding command would be:

mixcr analyze 10x-sc-5gex \
--species hsa \
-MrefineTagsAndSort.whitelists.CELL=builtin:737K-april-2014_rc \
--tag-pattern "^(CELL:N{14})(UMI:N{10})\^(R2:*)" \
input_R1.fastq.gz \
input_R2.fastq.gz \
output

However, based on this link I found, the cell barcode is actually located in the Index read.

Given this, the command should be:

mixcr analyze 10x-sc-5gex \
--species hsa \
-MrefineTagsAndSort.whitelists.CELL=builtin:737K-april-2014_rc \
--tag-pattern "^(UMI:N{10})\^(R2:*)\^(CELL:N{14})" \
input_R1.fastq.gz \
input_R2.fastq.gz \
input_I1.fastq.gz \
output

Illumina Bio-Rad SureCell 3' WTA for ddSEQ

We haven't tried this protocol before. Since it's a 3' transcriptome, we don't anticipate a substantial number of TCR/BCR clones from it. But, based on the link shared above, the preset should look like this:

mixcr analyze generic-single-cell-gex-with-umi \
--species hsa \
--tag-pattern "^(CELL1:N{6})tagccatcgcattgc(CELL2:N{6})tacctctgagctgaa(CELL3:N{6})acg(UMI:N{8})\^(R2:*)" \
input_R1.fastq.gz \
input_R2.fastq.gz \
output

Sincerely, Mark

malonzm1 commented 10 months ago

Thanks!

Do you also have a preset for Fluidigm C1?

mizraelson commented 10 months ago

Can you share the protocol that you used fro cDNA library construction? To my knowledge Fluidigm C1 platform can be used with different chemistry.

If you can provide the library structure I will help you with the command. UMIs, CELL barcodes any technical sequences, are all cells in the single pair of FASTQ files or each cells' sequences are in a distinct pair of FSTQ files, etc.

mizraelson commented 10 months ago

Is it related to this #1105

malonzm1 commented 10 months ago

Yes. Many thanks!