milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
317 stars 78 forks source link

`Feature for allele search doesn't intersect JGene` error when running `findAlleles` #1706

Closed michael-ford closed 3 weeks ago

michael-ford commented 3 weeks ago

Checklist before submitting the issue:

Expected Result

Have run mixcr analyze invivoscribe-human-dna-ighv-leader-lymphotrack, now trying to run findAlleles to call alleles / novel variants. Getting a Feature for allele search doesn't intersect JGene error?

Confirmed the following:

  1. clns file has clones by running mixcr exportAirr - all clones seem to have V, D and J calls. Example:
    clone_id        sequence_id     sequence        rev_comp        productive      v_call  d_call  j_call  c_call  sequence_alignment      germline_alignment      complete_vdj    junction        junction_aa     np1     np2     cdr1    cdr1_aa cdr2    cdr2_aa cdr3    cdr3_aa fwr1    fwr1_aa fwr2    fwr2_aa fwr3    fwr3_aa fwr4    fwr4_aa v_score v_cigar d_score d_cigar   j_score j_cigar c_score c_cigar junction_length np1_length      np2_length      v_germline_start        v_sequence_start        v_germline_end  v_sequence_end  d_germline_start        d_sequence_start        d_germline_end  d_sequence_end  j_germline_start        j_sequence_start        j_germline_end  j_sequence_end  c_germline_start        c_sequence_start  c_germline_end  c_sequence_end  v_alignment_start       v_alignment_end d_alignment_start       d_alignment_end j_alignment_start       j_alignment_end c_alignment_start       c_alignment_end duplicate_count
    0       clone.0 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACATGCACTGTCTGTGGTGACTCCATCAGTAGTTACTACTGGAACTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGATATATCTCTTACACTGGGAGCACCACCTACAACCCCTCCCTCAAGAGTCGAGTCTCCATATCAATAGACACGTCCAAGAACCAGTTCTCCCTGAACCTGAGGTCTGTGACCGCTGCGGACACGGCCGTGTATTATTGTGCGAGAGAGGTGTATGTTAATGACTACTACTACGGTATGGACGTCTGG   F       TIGHV4-59*00      IGHD2-8*00      IGHJ6*00                CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACATGCACTGTCTGTGGTGACTCCATCAGTAGTTACTACTGGAACTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGATATATCTCTTACACTGGGAGCACCACCTACAACCCCTCCCTCAAGAGTCGAGTCTCCATATCAATAGACACGTCCAAGAACCAGTTCTCCCTGAACCTGAGGTCTGTGACCGCTGCGGACACGGCCGTGTATTATTGTGCGAGAGAGGTGTATGTTAATGACTACTACTACGGTATGGACGTCTGG     CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGAGGTGTATGCTANNNACTACTACTACGGTATGGACGTCTGG     F       TGTGCGAGAGAGGTGTATGTTAATGACTACTACTACGGTATGGACGTCTGG     CAREVYVNDYYYGMDVW               ATG     GGTGACTCCATCAGTAGTTACTAC        GDSISSYY        ATCTCTTACACTGGGAGCACC   ISYTGST GCGAGAGAGGTGTATGTTAATGACTACTACTACGGTATGGACGTC   AREVYVNDYYYGMDV CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACATGCACTGTCTGT     QVQLQESGPGLVKPSETLSLTCTVCTGGAACTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGATAT      WNWIRQPPGKGLEWIGY       ACCTACAACCCCTCCCTCAAGAGTCGAGTCTCCATATCAATAGACACGTCCAAGAACCAGTTCTCCCTGAACCTGAGGTCTGTGACCGCTGCGGACACGGCCGTGTATTAT TYNPSLKSRVSISIDTSKNQFSLNLRSVTAADTAVYY                   2553.0  159N62=1X10=1X5=1X23=1X42=1X7=1X5=1X11=1X28=1X8=1X31=1X5=1X32=1X11=     41.0    293S47N8=1X2=   260.0   307S26N26=                        51      0       3       160     1       452     293     48      294     58      304     27      308     52      333                                     1       293     294     304     308     333                     37340
  2. exportAlignmentsPretty shows reads covering L1 through FR4:

    >>> Read ids: 0
    
               ><L1
                <5'UTR                                     L1>
                 M  K  H  L  W  F  F  L  L  L  V  A  A  P  R  w
    Quality    22622267777777777777777777777776777777777777777777777777777767777777777777777777
    Target0  0 TATGAAGCACCTGTGGTTCTTCCTTCTCCTGGTGGCAGCTCCCAGATGTGAGTATCTCAGGGATCCAGACATGGGGATAT 79  Score
    IGHV4-59*00 20  atgaaAcaTctgtggttcttccttctcctggtggcagctcccagatgtgagtatctcagggatccagacatggggatat 98  3856
    
                                                                <L2     L2><FR1
                                                               w  V  L  S  Q  V  Q  L  Q  E  S
    Quality    77777777777777777777777777777777777777777777777777777777776777777777777777777777
    Target0 80 GGGAGGTGCCTCTGATCCCAGGGCTCACTGTGGGTCTCTCTGTTCACAGGGGTCCTGTCCCAGGTGCAGCTGCAGGAGTC 159  Score
    IGHV4-59*00 99 gggaggtgcctctgatcccagggctcactgtgggtctctctgttcacaggggtcctgtcccaggtgcagctgcaggagtc 178  3856
    
                                                                   FR1><CDR1              CDR1><
                 G  P  G  L  V  K  P  S  E  T  L  S  L  T  C  T  V  S  G  D  S  I  S  R  Y  N  W
    Quality     77777777777777777777777776567767777777775777674777777777737677777577777567767777
    Target0 160 GGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACATGCACTGTCTCTGGTGACTCCATCAGTAGGTACAACT 239  Score
    IGHV4-59*00 179 gggcccaggactggtgaagccttcggagaccctgtccctcacCtgcactgtctctggtgGctccatcagtagTtacTact 258  3856
    
                                                              FR2><CDR2           CDR2><FR3
                  R  W  I  R  Q  P  P  G  K  G  L  E  W  I  G  Y  I  S  Y  T  G  S  T  T  Y  N
    Quality     77777655353577677777776735765555366737766576675767777766767756545675511764567657
    Target0 240 GGAGGTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGATATATCTCTTACACTGGGAGCACCACCTACAAC 319  Score
    IGHV4-59*00 259 ggagCtggatccggcagcccccagggaagggactggagtggattggGtatatctAttacaGtgggagcaccaActacaac 338  3856
    
                P  S  L  K  S  R  V  S  I  S  I  D  T  S  K  N  Q  F  S  L  N  L  R  S  V  T  A
    Quality     76167467776777766676222777677757777777767677777777776646777777776777777677777777
    Target0 320 CCCTCCCTCAAGAGTCGAGTCTCCATATCAATAGACACGTCCAAGAACCAGTTCTCCCTGAACCTGAGGTCTGTGACCGC 399  Score
    IGHV4-59*00 339 ccctccctcaagagtcgagtcAccatatcaGtagacacgtccaagaaccagttctccctgaaGctgagCtctgtgaccgc 418  3856
    
                                  FR3><CDR3    V><D       D>  <J                    CDR3><FR4
                 A  D  T  A  V  Y  Y  C  A  R  E  V  Y  V  N  Y  Y  Y  Y  G  M  D  V  W   G  Q
    Quality     47777677777777777777775777777777777777777777777777777777777777777777777777777777
    Target0 400 TGCGGACACGGCCGTGTATTATTGTGCGAGAGAGGTGTATGTTAATTACTACTACTACGGTATGGACGTCTGGGGCCAAG 479  Score
    IGHV4-59*00 419 tgcggacacggccgtgtattaCtgtgcgagaga                                                451  3856
    IGHD2-8*00  47                                  ggtgtatgCta                                     57   41
    IGHJ6*00  25                                               tactactactacggtatggacgtctggggccaag 58   580
    
                                 FR4>
             G  T  T  V  T  V  S  S _
    Quality     777777777777777777777766646
    Target0 480 GGACCACGGTCACCGTCTCCTCAGGTA 506  Score
    IGHJ6*00  59 ggaccacggtcaccgtctcctcag    82   580

Actual Result

$ mixcr findAlleles --output-template {file_name}.allelic.clns --export-alleles-mutations allele_stats.tsv --verbose  02-RE13-BLD-Leader_S2_L001.clns
Available CPU: 16, used CPU: unspecified. Available memory: 40960 Mb, used memory: 12000 Mb
Feature for search alleles: {FR1Begin:CDR3End}
Name of the original library: repseqio.v4.0
Step 1 of 7: count diversity for dataset: 0%
App version: 4.6.0; built=Sat Dec 09 14:48:42 EST 2023; rev=c9fafa41fe; lib=repseqio.v4.0
com.milaboratory.app.ValidationException: Feature for allele search doesn't intersect JGene
        at com.milaboratory.o.gP.<init>(SourceFile:122)
        at com.milaboratory.o.gP$b.a(SourceFile:1176)
        at com.milaboratory.mixcr.cli.CommandFindAlleles.run1(SourceFile:353)
        at com.milaboratory.mixcr.cli.MiXCRCommandWithOutputs.run0(SourceFile:69)
        at com.milaboratory.mixcr.cli.MiXCRCommand.run(SourceFile:37)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at com.milaboratory.mixcr.cli.Main.registerLogger$lambda-27(SourceFile:514)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at com.milaboratory.mixcr.cli.Main.main(SourceFile:101)
Feature for allele search doesn't intersect JGene
com.milaboratory.app.ValidationException: Feature for allele search doesn't intersect JGene
        at com.milaboratory.o.gP.<init>(SourceFile:122)
        at com.milaboratory.o.gP$b.a(SourceFile:1176)
        at com.milaboratory.mixcr.cli.CommandFindAlleles.run1(SourceFile:353)
        at com.milaboratory.mixcr.cli.MiXCRCommandWithOutputs.run0(SourceFile:69)
        at com.milaboratory.mixcr.cli.MiXCRCommand.run(SourceFile:37)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at com.milaboratory.mixcr.cli.Main.registerLogger$lambda-27(SourceFile:514)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at com.milaboratory.mixcr.cli.Main.main(SourceFile:101)

Exact MiXCR commands

mixcr findAlleles --output-template {file_name}.allelic.clns --export-alleles-mutations allele_stats.tsv --verbose  02-RE13-BLD-Leader_S2_L001.clns
mizraelson commented 3 weeks ago

Hi, because for this kit the reverse primer is located in FR4 region, the clones are assembled by {FR1Begin:CDR3End} feature to exclude the primer sequence. If needed you can manually add the following parameter to mixcr analyze command and then findAlleles would work:

mixcr analyze invivoscribe-human-dna-ighv-leader-lymphotrack \
--assemble-clonotypes-by VDJRegion \
input_R1.fastq.gz \
input_R2.fastq.gz \
output