Closed krkathuria closed 10 months ago
Do you have a pair of FASTQ files for each cell?
Yes, I do.
Then, in the command you should replace a part of input file name that marks each well with {{CELL:a}}
.
E.g.:
mixcr analyze smart-seq2-vdj \
--species hsa \
/usr/directory/{{CELL:a}}-trimmed-pair1.fastq.gz \
/usr/directory/{{CELL:a}}-trimmed-pair2.fastq.gz \
B-HA_Sp40-44_p05c01r01
If you run the command above, 44_p05c01r01_8_01_N19_TGGACTGGAACA_ATAAAGAACCCG
will be treaded as a cell ID. Does it make sense ?
Hi, yes I understand. But unfortunately, since the FASTQ files for all cells are in a single directory, this may not be possible. The input between the different cells would be indistinguishable for mixcr.
I ran mixcr analyze rna-seq
instead followed by exportAirr to get one AIRR formatted tsv per cell, which I concatenated into a single file for downstream analysis in scanpy. Do you see any reason this would be suboptimal to running the smart-seq2-vdj
preset?
All files should be in the same directory. That is the whole point. MiXCR will aggregate all files for all cells and process them all together, assigning a part of file name marked by {{CELL:a}} to each cell. In the end you will have a clonotype table where you will see a cell id for every clone.
E.g.: |
Cell ID | Clone |
---|---|---|
44_p05c01r01_8_01_N19_ACGTACGTACGT_ACGTACGTACGT | CloneA | |
46_p05c01r01_8_01_N19_GTCAGTCAGTCA_GTCAGTCAGTCA | CloneB | |
47_p05c01r01_8_01_N19_TGACTGACTGAC_TGACTGACTGAC | CloneC | |
48_p05c01r01_8_01_N19_CAGTCAGTCAGT_CAGTCAGTCAGT | CloneD |
Just run the command bellow and check the output:
mixcr analyze smart-seq2-vdj \
--species hsa \
/usr/directory/{{CELL:a}}-trimmed-pair1.fastq.gz \
/usr/directory/{{CELL:a}}-trimmed-pair2.fastq.gz \
B-HA_Sp40-44_p05c01r01
Checklist before submitting the issue:
App version: 4.6.0; built=Sat Dec 09 11:48:42 PST 2023; rev=c9fafa41fe; lib=repseqio.v4.0
Expected Result
I have generated data using Smart-Seq-2 and am trying to run "mixcr analyze" with the "smart-seq2-vdj" preset.
Actual Result
The command returned the following error: Must contain at least one Cell tag (determined as tag name starting from "cell" (like "CELL1", "Cell", "CellId", etc..)).
This is occurring because I do not have the actual substring "cell" in my fastq name as is required. Instead, I use a combination of DNA barcode sequence and other identify information to label each cell.
Request: Would it be possible to edit the preset so it can be used without having the substring "cell" in the fastq name?
Exact MiXCR commands
mixcr analyze smart-seq2-vdj --species hsa /usr/directory/B-HA_Sp40-44_p05c01r01_8_01_N19_TGGACTGGAACA_ATAAAGAACCCG-trimmed-pair1.fastq.gz /usr/directory/B-HA_Sp40-44_p05c01r01_8_01_N19_TGGACTGGAACA_ATAAAGAACCCG-trimmed-pair2.fastq.gz B-HA_Sp40-44_p05c01r01
MiXCR report files
None