milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
335 stars 79 forks source link

[Request] Using smart-seq2-vdj without "cell" substring as cell tag #1502

Closed krkathuria closed 10 months ago

krkathuria commented 10 months ago

Checklist before submitting the issue:

App version: 4.6.0; built=Sat Dec 09 11:48:42 PST 2023; rev=c9fafa41fe; lib=repseqio.v4.0

Expected Result

I have generated data using Smart-Seq-2 and am trying to run "mixcr analyze" with the "smart-seq2-vdj" preset.

Actual Result

The command returned the following error: Must contain at least one Cell tag (determined as tag name starting from "cell" (like "CELL1", "Cell", "CellId", etc..)).

This is occurring because I do not have the actual substring "cell" in my fastq name as is required. Instead, I use a combination of DNA barcode sequence and other identify information to label each cell.

Request: Would it be possible to edit the preset so it can be used without having the substring "cell" in the fastq name?

Exact MiXCR commands

mixcr analyze smart-seq2-vdj --species hsa /usr/directory/B-HA_Sp40-44_p05c01r01_8_01_N19_TGGACTGGAACA_ATAAAGAACCCG-trimmed-pair1.fastq.gz /usr/directory/B-HA_Sp40-44_p05c01r01_8_01_N19_TGGACTGGAACA_ATAAAGAACCCG-trimmed-pair2.fastq.gz B-HA_Sp40-44_p05c01r01

MiXCR report files

None

mizraelson commented 10 months ago

Do you have a pair of FASTQ files for each cell?

krkathuria commented 10 months ago

Yes, I do.

mizraelson commented 10 months ago

Then, in the command you should replace a part of input file name that marks each well with {{CELL:a}}.

E.g.:

mixcr analyze smart-seq2-vdj \
    --species hsa \
    /usr/directory/{{CELL:a}}-trimmed-pair1.fastq.gz \
    /usr/directory/{{CELL:a}}-trimmed-pair2.fastq.gz \
    B-HA_Sp40-44_p05c01r01

If you run the command above, 44_p05c01r01_8_01_N19_TGGACTGGAACA_ATAAAGAACCCG will be treaded as a cell ID. Does it make sense ?

krkathuria commented 10 months ago

Hi, yes I understand. But unfortunately, since the FASTQ files for all cells are in a single directory, this may not be possible. The input between the different cells would be indistinguishable for mixcr.

I ran mixcr analyze rna-seq instead followed by exportAirr to get one AIRR formatted tsv per cell, which I concatenated into a single file for downstream analysis in scanpy. Do you see any reason this would be suboptimal to running the smart-seq2-vdj preset?

mizraelson commented 10 months ago
All files should be in the same directory. That is the whole point. MiXCR will aggregate all files for all cells and process them all together, assigning a part of file name marked by {{CELL:a}} to each cell. In the end you will have a clonotype table where you will see a cell id for every clone. E.g.: Cell ID Clone
44_p05c01r01_8_01_N19_ACGTACGTACGT_ACGTACGTACGT CloneA
46_p05c01r01_8_01_N19_GTCAGTCAGTCA_GTCAGTCAGTCA CloneB
47_p05c01r01_8_01_N19_TGACTGACTGAC_TGACTGACTGAC CloneC
48_p05c01r01_8_01_N19_CAGTCAGTCAGT_CAGTCAGTCAGT CloneD

Just run the command bellow and check the output:

mixcr analyze smart-seq2-vdj \
    --species hsa \
    /usr/directory/{{CELL:a}}-trimmed-pair1.fastq.gz \
    /usr/directory/{{CELL:a}}-trimmed-pair2.fastq.gz \
    B-HA_Sp40-44_p05c01r01