milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
336 stars 79 forks source link

Duplicates of found alleles #1853

Open Thopic opened 1 week ago

Thopic commented 1 week ago

Hi, thanks for the great work with MiXCR. I have an issue that's very similar to #1823.

Checklist before submitting the issue:

Expected Result

I'm running mixcr findAlleles on *.clna files and expect the command to successfully complete, modifying the files as needed.

Actual Result

Instead, at the end of step 5/6 (searching for V alleles), I receive the error

There are duplicates of found alleles, [IGHV3-30-5*18x found on IGHV3-30*18] repeated

I suspect this issue originates from the previous step (mixcr assemble), where I specified the option --assemble-clonotypes-by {CDR1Begin:FR4End}, which seems to be incompatible with findAlleles. These two alleles are picked likely because they only differ in the FR1 region. Unfortunately, assembling using the full VDJRegion isn’t possible with my data (I don't have the FR1).

Is there a way to bypass or ignore this issue?

Exact MiXCR commands

mixcr align -t 100 --preset generic-amplicon-with-umi --species hsa --rna \
--tag-pattern '^(UMI:N{8})(R1:*)\^(MIU:N{8})(R2:*)' \
--floating-left-alignment-boundary \
--rigid-right-alignment-boundary C \
-OsaveOriginalReads=true \
${SAVE_DIR}joined_fastq/${SAMPLE}_R1.fastq.gz \
${SAVE_DIR}joined_fastq/${SAMPLE}_R2.fastq.gz \
${SAVE_DIR}align/${SAMPLE}.vdjca;

mixcr refineTagsAndSort --dont-correct ${SAVE_DIR}align/${SAMPLE}.vdjca ${SAVE_DIR}align/${SAMPLE}.sorted.vdjca;

mixcr assemble \
--write-alignments --assemble-clonotypes-by {CDR1Begin:FR4End} \
${SAVE_DIR}align/${SAMPLE}.sorted.vdjca ${SAVE_DIR}clones/${SAMPLE}.clna;

mixcr findAlleles -Xmx240g -t 100 \
--dont-remove-unused-genes \
--output-template {file_dir_path}/{file_name}.corrected_alleles.clns \
--export-library Alleles/blod_alleles_${INDIV}.fasta --export-alleles-mutations \
Alleles/blod_alleles_${INDIV}_descr.csv ${SAMPLES};

MiXCR report files

Errors:

WARNING: Didn't reproduce allele on subset of IGHV3-64*02: [IGHV3-64*04-M6-CDR3-1-XC57SDFWG2ZQM2G4YM5YSTJL, IGHV3-64*04-M2-AX5G7BJPK4KL3RF4EA3JYVOS]
WARNING: Didn't reproduce allele on subset of IGHV3-66*03: [IGHV3-66*02-M2-R4D4W6KGKHVZ3PGORK3NW5VK]
WARNING: Didn't reproduce allele on subset of IGHV3-33*01: [IGHV3-33*01-M5-R77A3A3GB4V5WY7CCF3O72NT, IGHV3-33*01-M1-6QJAKJEK4G2AEDWVABX63Z53]
WARNING: Didn't reproduce allele on subset of IGHV3-30-3*01: [IGHV3-30-3*01-M1-C5CBOPP644A4SPWEI7MJWQ7A, IGHV3-30-3*01-M6-QCXBI76YSFGL56GJAIRKWNIQ]
WARNING: Didn't reproduce allele on subset of IGHV4-59*01: [IGHV4-59*04-M2-CDR3-5-24IP3YCLLUIMYXJHGFMEPMGN]
App version: 4.7.0; built=Wed Aug 07 15:19:48 EDT 2024; rev=976ba14139; lib=repseqio.v5.1
There are duplicates of found alleles, [IGHV3-30-5*18x found on IGHV3-30*18] repeated

Output:

Step 1 of 6: count diversity for dataset: 0%
Step 1 of 6: count diversity for dataset: 15%  ETA: 00:00:17
Step 1 of 6: count diversity for dataset: 26.6%  ETA: 00:00:12
Step 1 of 6: count diversity for dataset: 38.1%  ETA: 00:00:10
Step 1 of 6: count diversity for dataset: 49.2%  ETA: 00:00:09
Step 1 of 6: count diversity for dataset: 61.9%  ETA: 00:00:08
Step 1 of 6: count diversity for dataset: 72.1%  ETA: 00:00:05
Step 1 of 6: count diversity for dataset: 82.4%  ETA: 00:00:03
Step 1 of 6: count diversity for dataset: 92.8%  ETA: 00:00:01
Step 2 of 6: grouping by the same J gene: 0%
Step 2 of 6: grouping by the same J gene: 10.2%  ETA: 00:00:26
Step 2 of 6: grouping by the same J gene: 24.6%  ETA: 00:00:20
Step 2 of 6: grouping by the same J gene: 36.6%  ETA: 00:00:21
Step 2 of 6: grouping by the same J gene: 47.3%  ETA: 00:00:14
Step 2 of 6: grouping by the same J gene: 59.6%  ETA: 00:00:16
Step 2 of 6: grouping by the same J gene: 71%  ETA: 00:00:12
Step 2 of 6: grouping by the same J gene: 83.6%  ETA: 00:00:05
Step 2 of 6: grouping by the same J gene: 93.7%  ETA: 00:00:03
Step 2 of 6: Searching for J alleles: 0%
Step 2 of 6: Searching for J alleles: 10.5%  ETA: 00:01:27
Step 2 of 6: Searching for J alleles: 26.3%  ETA: 00:00:04
Step 2 of 6: Searching for J alleles: 84.2%  ETA: 00:00:00
Step 3 of 6: grouping by the same V gene: 0%
Step 3 of 6: grouping by the same V gene: 12.5%  ETA: 00:00:21
Step 3 of 6: grouping by the same V gene: 22.8%  ETA: 00:00:15
Step 3 of 6: grouping by the same V gene: 35.6%  ETA: 00:00:15
Step 3 of 6: grouping by the same V gene: 48%  ETA: 00:00:12
Step 3 of 6: grouping by the same V gene: 60.6%  ETA: 00:00:12
Step 3 of 6: grouping by the same V gene: 72.1%  ETA: 00:00:07
Step 3 of 6: grouping by the same V gene: 83.2%  ETA: 00:00:04
Step 3 of 6: grouping by the same V gene: 93.7%  ETA: 00:00:01
Step 3 of 6: Searching for V alleles: 0%
Step 3 of 6: Searching for V alleles: 11.6%  ETA: 00:00:15
Step 3 of 6: Searching for V alleles: 25%  ETA: 00:00:05
Step 3 of 6: Searching for V alleles: 42.4%  ETA: 00:00:03
Step 3 of 6: Searching for V alleles: 57.6%  ETA: 00:00:11
Step 3 of 6: Searching for V alleles: 76.7%  ETA: 00:00:02
Step 3 of 6: Searching for V alleles: 87.2%  ETA: 00:00:02
Step 3 of 6: Searching for V alleles: 97.7%  ETA: 00:00:01
Step 4 of 6: grouping by the same J gene: 0%
Step 4 of 6: grouping by the same J gene: 10.2%  ETA: 00:00:26
Step 4 of 6: grouping by the same J gene: 23.6%  ETA: 00:00:17
Step 4 of 6: grouping by the same J gene: 37.4%  ETA: 00:00:18
Step 4 of 6: grouping by the same J gene: 48.5%  ETA: 00:00:13
Step 4 of 6: grouping by the same J gene: 59.6%  ETA: 00:00:14
Step 4 of 6: grouping by the same J gene: 70.2%  ETA: 00:00:14
Step 4 of 6: grouping by the same J gene: 82.2%  ETA: 00:00:05
Step 4 of 6: grouping by the same J gene: 92.6%  ETA: 00:00:02
Step 4 of 6: Searching for J alleles: 0%
Step 4 of 6: Searching for J alleles: 10.5%  ETA: 00:01:25
Step 4 of 6: Searching for J alleles: 26.3%  ETA: 00:00:04
Step 4 of 6: Searching for J alleles: 68.4%  ETA: 00:00:00
Step 4 of 6: Searching for J alleles: 78.9%  ETA: 00:00:32
Step 4 of 6: Searching for J alleles: 89.5%  ETA: 00:00:42
Step 5 of 6: grouping by the same V gene: 0%
Step 5 of 6: grouping by the same V gene: 10.7%  ETA: 00:00:25
Step 5 of 6: grouping by the same V gene: 25.9%  ETA: 00:00:14
Step 5 of 6: grouping by the same V gene: 39.6%  ETA: 00:00:13
Step 5 of 6: grouping by the same V gene: 51.7%  ETA: 00:00:12
Step 5 of 6: grouping by the same V gene: 62.2%  ETA: 00:00:10
Step 5 of 6: grouping by the same V gene: 74%  ETA: 00:00:06
Step 5 of 6: grouping by the same V gene: 84.3%  ETA: 00:00:04
Step 5 of 6: grouping by the same V gene: 97.1%  ETA: 00:00:00
Step 5 of 6: Searching for V alleles: 0%
Step 5 of 6: Searching for V alleles: 12.2%  ETA: 00:00:14
Step 5 of 6: Searching for V alleles: 33.7%  ETA: 00:00:06
Step 5 of 6: Searching for V alleles: 48.8%  ETA: 00:00:03
Step 5 of 6: Searching for V alleles: 65.7%  ETA: 00:00:08
Step 5 of 6: Searching for V alleles: 76.2%  ETA: 00:00:06
Step 5 of 6: Searching for V alleles: 88.4%  ETA: 00:00:01
Step 5 of 6: Searching for V alleles: 98.3%  ETA: 00:00:21
Step 5 of 6: Searching for V alleles: 99.4%  ETA: 00:01:00
Step 5 of 6: Searching for V alleles: 99.4%

Thanks !

mizraelson commented 1 week ago

Do you run findAlleles on samples from the same biological donor?

Thopic commented 1 week ago

Yes only one person