Closed malonzm1 closed 10 months ago
Hi. Do you mean mapping bulk TCR repertoire sequencing (assembled clones without cell barcodes) to scRNA-seq?
Thanks. I generated TCR repertoire using mixcr with scRNA-seq data. I would like to know which cells actually contain the clonotypes. Does that make sense?
What MiXCR command did you use?
mixcr analyze rna-seq -s hsa --rna 1.fastq.gz 2.fastq.gz outdir
and
mixcr analyze rna-seq -s hsa --rna {{CELL0ROW:a}}_1.fastq.gz {{CELL0ROW:a}}_2.fastq.gz outdir
but the data is for scRNA-seq (I run into errors when using the 10x and Smart-seq2 presets).
This preset is designed for a bulk RNA seq data.what protocol did you use for scRNA seq? Was is it 10x? What error did you get?
I used 10x and Smart-seq2. For 10x, there's no preset for 3'.
What chemistry did you use for 10x v2 or v3. Both are available for 3' and they differ in cell barcode length. Also, did you encounter any errors using 'smart-seq2'? Can you share the text of the error?
What are the presets for 10x v2 or v3 for 3'? Can I map the TCR clonotypes to the individual cells?
We originally didn't include the 3' preset because the clonotype yield from this type of data is significantly low.
If you're using 3'10x V2, the barcodes are identical to 5', so you can use the following preset:
mixcr analyze 10x-sc-5gex \
--species hsa \
input_R1.fastq.gz \
input_R2.fastq.gz \
output
For 3'10x V3, use the command below:
mixcr analyze 10x-sc-5gex \
--species hsa \
-MrefineTagsAndSort.whitelists.CELL=builtin:3M-febrary-2018 \
--tag-pattern "^(CELL:N{16})(UMI:N{12})\^(R2:*)" \
input_R1.fastq.gz \
input_R2.fastq.gz \
output
In the resulting output tables, there will be a column showing the CELL barcoded sequence. You can utilize this to map the cells to their respective expression data.
Do you require assistance with smart-seq2 data?
Thanks. In the above examples, does input_R2.fastq.gz contain the barcode+UMI sequence?
Yes, everything is according to 10x protocol.
Thanks. What if there are two coding fastq files?
Sorry, I mislead you, in 10x CELL and UMI barcodes are in R1 and the "coding" sequence is in R2.
Do you mean that you have a longer R1 which also covers some sequence in addition to barcodes?
Thanks. Yes.
Then you can use --tag-pattern "^(CELL:N{16})(UMI:N{10})(R1:*)\^(R2:*)"
and --tag-pattern "^(CELL:N{16})(UMI:N{12})(R1:*)\^(R2:*)"
for V2 and V3 respectively.
Thanks! I would also like to consult re smart-seq2 but I have to review my data first.
Then you can use
--tag-pattern "^(CELL:N{16})(UMI:N{10})(R1:*)\^(R2:*)"
and--tag-pattern "^(CELL:N{16})(UMI:N{12})(R1:*)\^(R2:*)"
for V2 and V3 respectively.
I read here https://www.biostars.org/p/9529864/#9572200 that in cases where R1 is barcode+UMI, the rest of the read can be effectively ignored. What tag pattern should I use in that case?
Also, do you have a preset for Smart-seq2 cel-seq?
We originally didn't include the 3' preset because the clonotype yield from this type of data is significantly low.
If you're using 3'10x V2, the barcodes are identical to 5', so you can use the following preset:
mixcr analyze 10x-sc-5gex \ --species hsa \ input_R1.fastq.gz \ input_R2.fastq.gz \ output
For 3'10x V3, use the command below:
mixcr analyze 10x-sc-5gex \ --species hsa \ -MrefineTagsAndSort.whitelists.CELL=builtin:3M-febrary-2018 \ --tag-pattern "^(CELL:N{16})(UMI:N{12})\^(R2:*)" \ input_R1.fastq.gz \ input_R2.fastq.gz \ output
In the resulting output tables, there will be a column showing the CELL barcoded sequence. You can utilize this to map the cells to their respective expression data.
Do you require assistance with smart-seq2 data?
do you also have a preset for 10x 3' v1?
Hi,
1) Preset for Smartseq2 is smart-seq2-vdj
. You can read about it here.
2) Use the original commands provided above (where the pattern doesn't have R1
) to skip the rest of R1 read.
3) I couldn't find any info regarding 3' v1 on their website. If you can share the barcode structure and the whitelist I can help you with that.
- I couldn't find any info regarding 3' v1 on their website. If you can share the barcode structure and the whitelist I can help you with that.
Thanks. https://www.biostars.org/p/462568/
10x v1 Whitelist, 737K-april-2014_rc.txt CB length, 14 UMI start, 15 UMI length, 10 (courtesy ATpoint)
Hi,
How can I run mixcr with SureCell sequences?
SureCell (18 bp barcode, 8 bp UMI): surecell, ddseq, biorad
Thanks and good day.
Hi, are you certain you're using 3' v1? To my knowledge, that kit version was discontinued a long time ago.
From the link you provided, the tag pattern appears as follows:
--tag-pattern "^(CELL:N{14})(UMI:N{10})\^(R2:*)"
The corresponding command would be:
mixcr analyze 10x-sc-5gex \
--species hsa \
-MrefineTagsAndSort.whitelists.CELL=builtin:737K-april-2014_rc \
--tag-pattern "^(CELL:N{14})(UMI:N{10})\^(R2:*)" \
input_R1.fastq.gz \
input_R2.fastq.gz \
output
However, based on this link I found, the cell barcode is actually located in the Index read.
Given this, the command should be:
mixcr analyze 10x-sc-5gex \
--species hsa \
-MrefineTagsAndSort.whitelists.CELL=builtin:737K-april-2014_rc \
--tag-pattern "^(UMI:N{10})\^(R2:*)\^(CELL:N{14})" \
input_R1.fastq.gz \
input_R2.fastq.gz \
input_I1.fastq.gz \
output
We haven't tried this protocol before. Since it's a 3' transcriptome, we don't anticipate a substantial number of TCR/BCR clones from it. But, based on the link shared above, the preset should look like this:
mixcr analyze generic-single-cell-gex-with-umi \
--species hsa \
--tag-pattern "^(CELL1:N{6})tagccatcgcattgc(CELL2:N{6})tacctctgagctgaa(CELL3:N{6})acg(UMI:N{8})\^(R2:*)" \
input_R1.fastq.gz \
input_R2.fastq.gz \
output
Sincerely, Mark
Thanks!
Do you also have a preset for Fluidigm C1?
Can you share the protocol that you used fro cDNA library construction? To my knowledge Fluidigm C1 platform can be used with different chemistry.
If you can provide the library structure I will help you with the command. UMIs, CELL barcodes any technical sequences, are all cells in the single pair of FASTQ files or each cells' sequences are in a distinct pair of FSTQ files, etc.
Is it related to this #1105
Yes. Many thanks!
Hi,
I would like to map the mixcr TCR output to individual cells (using scRNA-seq data). That is, determine which cells contribute to the identified TCR repertoire. Is this possible?
Thanks and good day.