milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
325 stars 79 forks source link

failed to correctAndSortTags #766

Closed 0x1orz closed 1 year ago

0x1orz commented 2 years ago

cleaned reads have two UMI and primers located the V(D)J , with patterns like ^(MIF:N{:6})tcga(R1:*) \ ^(MIR:N{:6})ctag(R2:*)

java --version
openjdk 17.0.3-internal 2022-04-19
OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src)
OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)
mixcr analyze amplicon \
    --force-overwrite \
    --verbose \
    --threads ${threads} \
    --species hsa  --starting-material rna \
    --umi-pattern "$tag_pattern" \
    --align "--tag-parse-unstranded" \
    --adapters 'no-adapters' \
    --5-end v-primers --3-end  j-primers \
    --assemble "-OassemblingFeatures={FR1Begin:FR4End}" \
    --receptor-type bcr \
    --region-of-interest VDJRegion \
    --export '-p full' \
    --only-productive \
    --report ${samplename}.report \
    ${FQ1} ${FQ2} ${samplename}

log:

The following tags and their roles were recognised:
  Payload tags: R1, R2
  Molecule tags: MIFMIR
Alignment: 0%
Alignment: 1.2%  ETA: 02:40:17
Alignment: 3%  ETA: 01:50:51
Alignment: 4.8%  ETA: 01:48:37
Alignment: 6.4%  ETA: 01:55:24
Alignment: 8%  ETA: 01:53:07
Alignment: 10.1%  ETA: 01:28:57
Alignment: 11.4%  ETA: 02:10:57
Alignment: 13.2%  ETA: 01:39:00
Alignment: 14.9%  ETA: 01:37:21
Alignment: 16.4%  ETA: 01:52:47
Alignment: 18.2%  ETA: 01:33:24
Alignment: 19.8%  ETA: 01:38:37
Alignment: 21.6%  ETA: 01:29:22
Alignment: 23.2%  ETA: 01:34:23
Alignment: 24.8%  ETA: 01:33:33
Alignment: 26.5%  ETA: 01:30:22
Alignment: 27%  ETA: 04:29:00
Alignment: 28.4%  ETA: 01:45:43
Alignment: 29.7%  ETA: 01:43:56
Alignment: 31.5%  ETA: 01:17:50
Alignment: 33.8%  ETA: 00:57:44
Alignment: 36.1%  ETA: 00:55:50
Alignment: 38.3%  ETA: 00:56:55
Alignment: 40%  ETA: 01:08:26
Alignment: 42.1%  ETA: 00:57:15
Alignment: 44.8%  ETA: 00:40:59
Alignment: 47.6%  ETA: 00:36:52
Alignment: 50.2%  ETA: 00:38:52
Alignment: 52.6%  ETA: 00:39:04
Alignment: 55.1%  ETA: 00:36:51
Alignment: 57.5%  ETA: 00:35:04
Alignment: 59.8%  ETA: 00:35:03
Alignment: 62.1%  ETA: 00:33:00
Alignment: 64.7%  ETA: 00:27:31
Alignment: 67.6%  ETA: 00:22:52
Alignment: 70.5%  ETA: 00:19:51
Alignment: 73.8%  ETA: 00:16:11
Alignment: 76.8%  ETA: 00:15:40
Alignment: 79.9%  ETA: 00:12:59
Alignment: 82.7%  ETA: 00:12:10
Alignment: 84.8%  ETA: 00:15:02
Alignment: 86.8%  ETA: 00:13:01
Alignment: 89%  ETA: 00:10:15
Alignment: 91.3%  ETA: 00:07:37
Alignment: 93.3%  ETA: 00:06:38
Alignment: 94.9%  ETA: 00:06:14
Alignment: 96.7%  ETA: 00:03:46
Alignment: 98.8%  ETA: 00:01:04
============== Align Report ==============
Analysis time: 97.57m
Total sequencing reads: 11287896
Successfully aligned reads: 10129768 (89.74%)
Paired-end alignment conflicts eliminated: 121752 (1.08%)
Alignment failed, no hits (not TCR/IG?): 28553 (0.25%)
Alignment failed because of absence of V hits: 5175 (0.05%)
Alignment failed because of absence of J hits: 40255 (0.36%)
No target with both V and J alignments: 759637 (6.73%)
Absent barcode: 324508 (2.87%)
Overlapped: 8911974 (78.95%)
Overlapped and aligned: 8855545 (78.45%)
Alignment-aided overlaps: 226176 (2.55%)
Overlapped and not aligned: 56429 (0.5%)
No CDR3 parts alignments, percent of successfully aligned: 125871 (1.24%)
Partial aligned reads, percent of successfully aligned: 85798 (0.85%)
V gene chimeras: 184063 (1.63%)
J gene chimeras: 2 (0%)
IGL chains: 187 (0%)
IGL non-functional: 1 (0.53%)
IGH chains: 10129581 (100%)
IGH non-functional: 165427 (1.63%)
Realigned with forced non-floating bound: 4555180 (40.35%)
Realigned with forced non-floating right bound in left read: 179557 (1.59%)
Realigned with forced non-floating left bound in right read: 179557 (1.59%)
Correction will be applied to the following tags: MIF, MIR
Initialization: progress unknown
Reading tags: 1.9%
Reading tags: 14%  ETA: 00:00:28
Reading tags: 24.5%  ETA: 00:00:28
Reading tags: 36.4%  ETA: 00:00:26
Reading tags: 48%  ETA: 00:00:22
Reading tags: 59.5%  ETA: 00:00:17
Reading tags: 71.5%  ETA: 00:00:11
Reading tags: 83.5%  ETA: 00:00:06
Reading tags: 95%  ETA: 00:00:01
Counting MIF: 9%
Counting MIF: 23.6%  ETA: 00:00:05
Counting MIF: 39.1%  ETA: 00:00:03
Counting MIF: 52.4%  ETA: 00:00:03
Counting MIF: 71.8%  ETA: 00:00:01
Counting MIF: 98%  ETA: 00:00:00
Correcting MIF: 24%
picocli.CommandLine$ExecutionException: Error while running command (com.milaboratory.mixcr.cli.CommandAnalyze$CommandAmplicon@40c2ce52): java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
    at picocli.CommandLine.executeUserObject(CommandLine.java:1778)
    at picocli.CommandLine.access$900(CommandLine.java:145)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2141)
    at com.milaboratory.mixcr.cli.Main$1.handle(Main.java:83)
    at com.milaboratory.mixcr.cli.Main$1.handle(Main.java:72)
    at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1968)
    at com.milaboratory.mixcr.cli.Main.handleParseResult(Main.java:94)
    at com.milaboratory.mixcr.cli.Main.main(Main.java:66)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
    at com.milaboratory.core.sequence.SequenceQuality.value(SequenceQuality.java:144)
    at com.milaboratory.core.sequence.SequencesUtils.qualityForMutation(SequencesUtils.java:196)
    at com.milaboratory.mitool.refinement.TagCorrector.correct(TagCorrector.kt:335)
    at com.milaboratory.mitool.refinement.TagCorrector.correct(TagCorrector.kt:189)
    at com.milaboratory.mixcr.cli.CommandCorrectAndSortTags.run0(CommandCorrectAndSortTags.java:166)
    at com.milaboratory.cli.ACommand.run(ACommand.java:112)
    at com.milaboratory.mixcr.cli.CommandAnalyze.run0(CommandAnalyze.java:702)
    at com.milaboratory.cli.ACommand.run(ACommand.java:112)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1769)
    ... 7 more
dbolotin commented 2 years ago

Thanks for reporting! This seems to be a known bug with multiple UMIs, the fix will be available with 4.1. Sorry for inconvenience! If you are interested in pre-release version, please write at support@milaboratories.com we will write you as soon as it will be available with the instructions on how to run it. Beware that the version we'll share will still be unfinished, and most probably something else (CLI, data format or analysis performance) will change from it in 4.1.

dbolotin commented 1 year ago

Hi, sorry for the long wait, please try new release version.

The equivalent of your command with the new CLI should be something like this:

mixcr analyze generic-bcr-umi-amplicon \
    --species hs \
    --rna \
    --tag-pattern "^(MIF:N{:6})tcga(R1:*) \ ^(MIR:N{:6})ctag(R2:*)"
    --assemble-clonotypes-by '{FR1Begin:FR4End}' \
    -M align.tagUnstranded=true \
    --rigid-left-alignment-boundary \
    --rigid-right-alignment-boundary J \
    input_R1.fastq.gz \
    input_R2.fastq.gz \
    result

Please check it with the docs.