milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
326 stars 79 forks source link

High total sequencing reads but very low used in clonotypes. #230

Closed suntaosimon closed 7 years ago

suntaosimon commented 7 years ago

5'RACE TCR with MiSeq PE300.

mixcr align --chains ALL --species hsa --parameters rna-seq -OallowPartialAlignments=true -OvParameters.geneFeatureToAlign=VTranscript \ --report alignmentReport.log combined_clean_R1.fastq.gz combined_clean_R2.fastq.gz alignment.vdjca ============= Report ============== Analysis time: 2.06m Total sequencing reads: 305092 Successfully aligned reads: 216831 (71.07%) Alignment failed, no hits (not TCR/IG?): 74690 (24.48%) Alignment failed because of absence of CDR3 parts: 1965 (0.64%) Alignment failed because of low total score: 11606 (3.8%) Overlapped: 115686 (37.92%) Overlapped and aligned: 38504 (12.62%) Overlapped and not aligned: 77182 (25.3%) TRA chains: 6 (0%) TRB chains: 212193 (99.98%) TRD chains: 37 (0.02%) TRG chains: 3 (0%) TRA,TRD chains: 3 (0%)

$ mixcr assemble --report clonReport.log alignment.vdjca alignment.clns Initialization: progress unknown Assembling initial clonotypes: 28.1% Assembling initial clonotypes: 60.1% ETA: 00:00:01 Assembling initial clonotypes: 92.7% ETA: 00:00:00 Mapping low quality reads: 48.6% Clustering: 21.5% Building clones: 3.6% Building clones: 83.9% ETA: 00:00:00 Writing clones: 0% ============= Report ============== Analysis time: 7.77s Final clonotype count: 7924 Average number of reads per clonotype: 1.96 Reads used in clonotypes, percent of total: 15548 (5.18%) Reads used in clonotypes before clustering, percent of total: 15548 (5.18%) Number of reads used as a core, percent of used: 12995 (83.58%) Mapped low quality reads, percent of used: 2553 (16.42%) Reads clustered in PCR error correction, percent of used: 0 (0%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Reads dropped due to the lack of a clone sequence: 103148 (34.35%) Reads dropped due to low quality: 231 (0.08%) Reads dropped due to failed mapping: 93315 (31.08%) Reads dropped with low quality clones: 0 (0%) Clonotypes eliminated by PCR error correction: 0 Clonotypes dropped as low quality: 0 Clonotypes pre-clustered due to the similar VJC-lists: 0 TRB chains: 7924 (100%)

What could this reports tell us? (Total sequencing reads: 305092, BUT Reads used in clonotypes, percent of total: 15548 (5.18%))

What dose "Reads dropped due to the lack of a clone sequence" mean?

Thanks!

dbolotin commented 7 years ago

To tell you the reason please post here first 10 alignments from vdjca file:

mixcr exportAlignmentsPretty -n 10 alignment.vdjca

Reads dropped due to failed mapping: 93315 (31.08%) and Mapped low quality reads, percent of used: 2553 (16.42%) also indicates low data quality. You can lower the quality threshold (e.g. -ObadQualityThreshold=10) to take reads with lower quality into your analysis.

suntaosimon commented 7 years ago

>>> Read id: 1

              <5'UTR                                                                             
   Quality    66666677467770242577777577667477265266777777276667777777777777755646764677776777   
   Target0  0 TAGGTTAACGCAGCGGTATCAACGCAGAGTACGGGGTTCCCCTTTCATCAATGCACAGATACAGAAGACCCCTCCGTCCT 79  Score
TRBV6-6*00 12 GaggtCTCAgAaTGACt-tcCTTg-agagt-c-CTgttcccctttcatcaatgcacagatacagaagacccctccgtcct 87  714

                    5'UTR><L1                                           L1><L2  L2><FR1           
   Quality    74224267726774574457777777557777742447245222467777655222225222564424222556774626    
   Target0 80 GGGGCACCTTCCATGAGCATCAGCCTCCTGTGCTGTGCCGCCTTTCCTCTCCTATGGGCAGGTCCAGTGAATGCTGGTGT 159  Score
TRBV6-6*00 88 ggAgcacctGccatgagcatcagcctcctgtgctgtgcAgcctttcctctcctGtgggcaggtccagtgaatgctggtgt 167  714

                                                                              FR1><CDR1     CDR    
   Quality     57667624766746722266267253552266254576777452323242222422245675244524323226522662    
   Target0 160 CACTCATACCCCAAAATTCCGCATCCTGATGATCGGCCAGAGCATGCCACTGCAGTGTACCCAGGATATGACACATAAAT 239  Score
TRBV6-6*00 168 cactcaGaccccaaaattccgcatcctgaAgatAggAcagagcatgAcactgcagtgtacccaggatatgaACcataaCt 247  714

               1><FR2                                                          
   Quality     555221312226645572511141555151456611332122322522322222322222    
   Target0 240 ACAACAACTGGTATCCACAACACCCAACCACGGGAAAACACCCAAAAGAAAAAGAAGAGA 299  Score
TRBV6-6*00 248 acaTGTactggtatcGacaaGaccca                                   273  714

Quality   23232522222223232252323533223221455143131111114222656264552642225424225667777542   
Target1 0 TCGACAAGCTCCTGTAATGGGTCTGAAGCTGTTTTTTTTTGCCGTGGGTGCTGTTATCCCTTATAAAGGAGAAGTCCCGA 79  Score

Quality    47446431015161144024651674222625577757274756477555467777476752327775247654777677    
Target1 80 ATGGCTACGACGTCTCCGTATCAACCACAGAGGATTTCCCGCTCAGGCTGGAGTTGGCTGCTCACTCCCAGACATCTGTG 159  Score

                                       <J           CDR3><FR4                    FR4><C            
   Quality     55754777777777464776267655477777747777767777774774756554427776477664552742656624    
   Target1 160 TACTTCTGTGCCAGCAGTACATCGACAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCGAGGACCTGAA 239  Score
TRBJ2-3*00  23                         acagatacgcagtattttggcccaggcacccggctgacagtgctcg           68   230
  TRBC2*00   0                                                                       aggacctgaa 9    260
  TRBC1*00   0                                                                       aggacctgaa 9    228

 Quality     462777764276646776777777777777777747767477677777767777766666    
 Target1 240 AAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTAGGTCGA 299  Score
TRBC2*00  10 aaacgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatctCCCAc-a 68   260
TRBC1*00  10 CaaGgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatctCCCAc-a 68   228

>>> Read id: 2

             <5'UTR                                                  5'UTR><L1                   
  Quality    44646772577460244446667676257777726772467747255 55747742462267422246222424267777    
  Target0  0 CCCGAATAAGCAGGGGTATCAACGCAGAGTACGTGATTCCTGTATGG-GTGGTATTCCAGCCATGGGTCCTGTGCTTCTC 78   Score
TRBV15*00 40 cAGgaatCag-agCCTGaGACaGAcaga-tGcTtCattcctgtatggGgtggtattccTgccatgggtcctgGgcttctc 117  519

                                          L1><L2  L2><FR1                                         
  Quality     25667646267257522672276442464722456224233222462225642225525642572745676527772256    
  Target0  79 CACTGGATGGCCCTTTGTCTCCTTGTAACAGGTCATGGGGATGACATGGTCCTAACGAACCCAACATACCAGGTTACTCA 158  Score
TRBV15*00 118 cactggatggccctttgtctccttgGaacaggtcatggggatgCcatggtcAtCCAgaacccaaGataccaggttacCca 197  519

                                               FR1><CDR1                                          
  Quality     66656552622232674777457762232222235522224252242522222222225225322222324222266222    
  Target0 159 GTTTGGAAAGCAAGTGACCCTGAGTTGTTCTCAGACGATGAACCATAACAAGATGATCTGCTATCCACATACATACAATC 238  Score
TRBV15*00 198 gtttggaaagcCagtgaccctgagttgttctcagacTTtgaaccataac                                246  519

Quality     2242222224232232222424323222336526736112245223222224232221225    
Target0 239 AATACCACAAAATACTGACCCAATAAAATAACAAAAAGCTAAAAAAGGCAACAAAAAACCC 299  Score

                                   FR2><CDR2         CDR2><FR3                                    
  Quality     3122532222225243253532 222652323232322232222332227763223111155554222241221122213    
  Target1   0 TCCTGCCCTATAGCTGTTTTTC-ACTTACTATTACATATAGGTTAACATTGAAGCACACCCCCCTTATAATGTCCCATCC 78   Score
TRBV15*00 276 tcAGgcccCaAagctgCtGttcCact-actatGacaAaGaTTttaacaAtgaagcaGacAcccctGataaCTtccAatcc 354  367

                                                                                  FR3><CDR3       
  Quality     25515114103222246565312277753451665115277322322246567746542255622455232625546477    
  Target1  79 AGGTGGCCGTACATTTCTTTCGGCTGTCTTGCCATCCGCTCACCAGGCCTGGGGGACGCAGCCATGTACCTGGGTGCCAC 158  Score
TRBV15*00 355 aggAggccgAacaCttctttcTgctTtcttgAcatccgctcaccaggcctgggggacAcagccatgtacctgTgtgccac 434  367

                    V>               <D D><J         CDR3><FR4                    FR4><C           
   Quality     55422642664662772622666324574626566777767677776677647627622567777567625664262677    
   Target1 159 CAGCAGACATCTGTACAGTCCTCTCCCTTCACCCCTCCACTGTGGGAATGGGACCAGGCTCACTGTGACAGAGGACCTGA 238  Score
 TRBV15*00 435 cagcaga                                                                          441  367
  TRBD2*00   2                       ctccc                                                      6    25
TRBJ1-6*00  29                            ttcacccctccactTtgggaaCgggaccaggctcactgtgacag          72   188
  TRBC1*00   0                                                                        aggacctga 8    244

 Quality     7762777776556266476266642777764777647777776574777744247766666    
 Target1 239 ACAAGGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCGTTGAGGA 299  Score
TRBC1*00   9 acaaggtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatc-tCCCACa 68   244

>>> Read id: 3

Quality   46666726777760526747677676777677777664277477477774276426765465777677677222466777   
Target0 0 TTACTATAAGCAGAGGTATCAACGCAGAGTACGGGATTCTTTCTTCAAAGCAGCCATGGTAATCAGGCTCCTCTGTCGTG 79  Score

Quality    66264577676266562452224224224566246222252265225264723465552352522226622652642222    
Target0 80 TGGCCTTTTGTTTCCTGGCAGTAGGCCTCGTAGATCTGCACGTACACCAGAGCTCGAGATATCAAGTCACAAGGAAGGGA 159  Score

Quality     42526647722472266245525632223255226225237647452232222235226265241555522211111312    
Target0 160 GTGAAAGTTGTTCGTGACTGTATCCAGGAAATTGAGCATGAAAAAATGTACTGGTATTGACACAAAACAAGCCGGTGGAT 239  Score

Quality     223111111124262322323223223223252113511335646722522223323222    
Target0 240 ACGGCGGCACTATTTCAACTTATATGTTAACAAAAAAGCAAAAGGAGGACTTACAGAGGG 299  Score

                                               FR2><CDR2        CDR2><FR3                         
  Quality     32522332222255525421111222552525476352667676346766453233222252322222222613351146    
  Target1   0 AGAAGACTCAGGGTTGGGGCTTCGGGTGATCTATTTCTCATATGATGTTAAAAAGAAAGAAAAAGGAGATTTTCCTGGGG 79   Score
TRBV28*00 264 aCaagacCcaggTCtggggctAcggCtgatctatttctcatatgatgttaaaaTgaaagaaaaaggagatAttcctgAgg 343  605

                                                                                             F    
  Quality     27641113042556255522422221175443422476477776256265312524765362445647762625264226    
  Target1  80 TGTACAGTCTGTCTAGAGAGAATAAGGAGCGCTTTTCCCTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTCTGTAC 159  Score
TRBV28*00 344 GgtacagtGtCtctagagagaaGaaggagcgcttCtccctgattctggagtccgccagcaccaaccagacatctAtgtac 423  605

                      V>                                                                           
               R3><CDR3               <J            CDR3><FR4                    FR4><C            
   Quality     27775525266647655226222764274777777577667777477777466567747776477777777777765727    
   Target1 160 CTCTGTGCCTACGGGGTTGGCGACACAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCGAGGACCTGAA 239  Score
 TRBV28*00 424 ctctgtgcc                                                                        432  605
TRBJ2-3*00  22                        cacagatacgcagtattttggcccaggcacccggctgacagtgctcg           68   235
  TRBC2*00   0                                                                       aggacctgaa 9    308
  TRBC1*00   0                                                                       aggacctgaa 9    276

 Quality     777777777666627776777777777777777777777777777777777777766266    
 Target1 240 AAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTGCGCACA 299  Score
TRBC2*00  10 aaacgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatct-cCcaca 68   308
TRBC1*00  10 CaaGgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatct-cCcaca 68   276

>>> Read id: 4

Quality   66322232663222232223226524332323253232323222222322222232755141245131442562232427   
Target0 0 TTTCCGTCTTGTGGTTTAAGGGTTGTGTTGGTATTTTAGTTAAGAGTGTATCCTTTGTTTTGGGCTTTGTTTTCAGTTGT 79  Score

Quality    62244111023215465775474664536676645625577777777755352277766256247762667666767765    
Target0 80 TGTTGAAGTAAGAAGACGGCATACGAGCTGTCGCGGGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGTGTCATAA 159  Score

                                          <J        CDR3><FR4                    FR4>              
   Quality     77627764725667777777777777777777777777777777777777777777777777777777777777757777    
   Target0 160 GCAGTGGTATCAACGCAGAGTACGGGGACTGAAGCTTTCTTTGGACAAGGCACCAGACTCACAGTTGTAGAGGACCTGAA 239  Score
TRBJ1-1*00  25                            actgaagctttctttggacaaggcaccagactcacagttgtag           67   215

Quality     77777777777777767777777777667777766376476257556275777776646674722425663246236653    
Target0 240 CAAGGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTAAATCTAAGAACGAAAGCGCGCCTGGT 319  Score

Quality     24226556662233222221412324223622423234225411111422254525111141111333222232222252    
Target0 320 CGGGAAAGAGACTCATCGACCTGGAAACTACTGGTAGTCACCGAAAAATGCAAAAAAAACAAACCATAATCACTAAGAAT 399  Score

Quality     322356525223221354122112315222223222222233252222133    
Target0 400 CTCAAAAGACTGCCGAATCGGCGAAGAAAAGCAAATGATCAACAAAAAAAC 450  Score

>>> Read id: 7

Quality   11776526322221222321462212323231323222366113223315141516444652564231332314317765   
Target0 0 TTTTTTTTTGTTTGTGGGGTTTTTTTTTTTTCTGTGTTATGTTTTTGTCTTTTTTTTTTTTTTGTCACGCAGAAGACGGC 79  Score

Quality    65546226751115552525567763247453415652232476777632237666777777777657767777676777    
Target0 80 ATACGAGATTTCGCGGGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCCTTTAAAGCAGTGGTATCAACGCAGAG 159  Score

                                         <J         CDR3><FR4                    FR4>              
   Quality     77777777777777777777767777777777777777777777777777777777777777777777777777777777    
   Target0 160 TACGGGGTGGGGACTAGCGGGAGTCGCAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAGAGGACCTGAA 239  Score
TRBJ2-1*00  26                           caatgagcagttcttcgggccagggacacggctcaccgtgctag           69   220

Quality     77777777777776577777777777577777777777777777777777777776666624622557766456111114    
Target0 240 AAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTTTGTTGTAGATCGGAAGAGCGTCGGGT 319  Score

Quality     56765745222222222523251132522322312111123111211232213311513233121212112232321322    
Target0 320 AGGGAAAGAGTGATTTAGCCTGGGTAGAGCTTGGGGGGGGACGAATCGTAAAACAAAAACAACAAAAAAATAAAACAATA 399  Score

Quality     3352111212232232642222232235422    
Target0 400 CAACACAAATAAAAAAACATGAAGGAAAGCA 430  Score

>>> Read id: 8

               <5'UTR                                                            5'UTR><L1         
    Quality    6666677675456056777 667766777264557265657774266676767777747777777457774625677757    
    Target0  0 ATGGCTGAAGCCGGGGTAT-CAACGCAGAGTACGGGAGAGCTGGAAACACCTCCATCCTGCCTCTTCATGCCATGGCCTC 78   Score
TRBV24-1*00 29 TtTTctATTTccATgCCCtGcTTcCcTCaACaTCC-agagctggaaacacctccatcctgcctcttcatgccatggcctc 107  491

                                                      L1><L2  L2><FR1                               
    Quality     76626672222464777224765777667577742626235224266476224657722222245657777746677562    
    Target0  79 CCTGCTCTTCTTCTGTGGGGCCTTTTATCTCCTGGGAACCGGGTCCATGGCTGCTGAAGTTAACCAGACCCCAAGGAATA 158  Score
TRBV24-1*00 108 cctgctcttcttctgtggggccttttatctcctgggaacAgggtccatggAtgctgaTgttaCccagaccccaaggaata 187  491

                                                            FR1><CDR1                               
    Quality     422232 3265677526222365774233222114324766265242647254242322246152222411135222242    
    Target0 159 GGCAAC-CAAAGACAGGAAAGAGGATTATGCCGAAATGTTCTCAGACTAATGGTCCTGCAAGCATTAACGAGTATGACCA 237  Score
TRBV24-1*00 188 gg-aTcAcaaagacaggaaagaggattatgcTgGaatgttctcagactaaGggtc                          241  491

Quality     11315633113323265231311113352322322232222323236742222223222322    
Target0 238 CGACCAAGGCATAGCACACCGGTTGATCCATTAAAACTATCTAGGACAAACACAGAAACAAA 299  Score

                                        FR2><CDR2         CDR2><FR3                                 
    Quality     32222522224432224222525333223322652222213322 33322232233223232252334225522532432    
    Target1   0 TTTATTGGTTTTTGGTTTGTTGTTGTACTCTTTTGATTGTAAAA-ATATAATCAAAGGATATGTTTCTTATGTATACAGT 78   Score
TRBV24-1*00 273 AGGaCtggGCCtACgGttgAtCtATtactcCtttgat-gtCaaaGatataaAcaaaggaGaGAtCtctGatgGatacagt 351  307

                                                                                       FR3><CDR3    
    Quality     21661634222747774522542151415145152246562222322162526625422254252322566445265552    
    Target1  79 GTCTCTCGACAGGCACAGGCTAATTTCGCCCTGTCCCTAGAGTCTGACATCCCCCACCCGACAGATCGTTACTTCTGGGC 158  Score
TRBV24-1*00 352 gtctctcgacaggcacaggctaaAttcTccctgtccctagagtctgCcatccccAaccAgacagCtcTttacttctgTgc 431  307

                                 <JP                                                                
                         V>       JP><J               CDR3><FR4                    FR4><C           
    Quality     76626554252622266667552225252222626223565432772625652242766777576477747647577776    
    Target1 159 CACCAGTGATTGGAGCGGGAGCTCTTACAAGGAGCAGTTCTTCTGGCCAGGGTCACGGCTCACCGTGCTAGAGGACCTGA 238  Score
TRBV24-1*00 432 caccagtgatt                                                                      442  307
 TRBJ2-1*00  16                  ggagctcCtacaaTgagcagttcttcGggccagggAcacggctcaccgtgctag          69   206
   TRBC2*00   0                                                                        aggacctga 8    254

 Quality     2756777777642467777777776776427762727777677777776725777766666    
 Target1 239 AAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTAGACCTG 299  Score
TRBC2*00   9 aaaacgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatctCCCAcAC 69   254

>>> Read id: 9

Quality   26462467677770556257257667677667667422222233242362243226624322432225225232666722   
Target0 0 GCCGGGCAAGCAGCGGTATCCACGCAGAGTACGGGGGGCACAAGCCTCCCATGCTGTTTGGACCACGGGTTCACCGGGAA 79  Score

Quality    52332222222124524222542667453226552214656264245755111131141132645222324252234111    
Target0 80 AGAGGAGAGGGGTGCTGAAGTCTCCCACTCTCCCACGTACACAGACACAACGCAGGGACAGTATTAAGCTCTCAGGGGGG 159  Score

Quality     11154444225231111333321111112222232222132222321311511564113512323223222235222222    
Target0 160 AGACAATCTGGGGCCGCGTTTCGCTTGCTCACTACCTCCAGTCTCGGGGGCAGGGCACACATTGTCAGCATCACTACAAT 239  Score

Quality     223223332111135112511123333225222223312131154422232222112112    
Target0 240 TCTTCAGCCCACCAAAACAACATAGGAACTCTCAGACGCGCGAACCCTCACGACCCCCCG 299  Score

Quality   12211453222125223223222242322323223222246515143233221132222213131131332245212322   
Target1 0 CGTGGGTCGGTGCCACGAGTCTTTGCTTTATTTAATATATGATGCCCAACATGCCCAACCAGGGTGGCCCACTGATTGGT 79  Score

Quality    55642222434111111222226335252646424112413115115311222321454122354267777746622222    
Target1 80 TCTCTGCAGAGTGGGCTGGTGTATCCTTCTCCACTGGTACGATCTCGCTCCCTGTGCAGCGTGACTCGGCCATGTATCGC 159  Score

                                           <J       CDR3><FR4                    FR4><C            
   Quality     42244522222222252655747764224642266552277767677776462767766766767576777777467674    
   Target1 160 GGGGCCAGAAGGATGACAGGGGTAAGGGACGAGCAGTCCTTCGGGCCGGGCACCAGGCTCACGGTCACAGAGGACCTGAA 239  Score
TRBJ2-7*00  25                             acgagcagtActtcgggccgggcaccaggctcacggtcacag           66   194
  TRBC2*00   0                                                                       aggacctgaa 9    270
  TRBC1*00   0                                                                       aggacctgaa 9    238

 Quality     646776777775777776727777777777777767477777477777777776425666    
 Target1 240 AAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTGACGCCT 299  Score
TRBC2*00  10 aaacgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatctCCcAcAC 69   270
TRBC1*00  10 CaaGgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatctCCcAcAC 69   238

>>> Read id: 10

Quality   66666775767670556777445246777677764467777777777764776476767677672677676477224252   
Target0 0 AAGTCACAAGCAGAGGTATCAACGCAGAGTACGGGGAAAACATCCTGAGGACAGTGCCTGGAGGTGAGAAGGAAGCCACC 79  Score

Quality    57267777726466565577777777522522626676624677622252357645567723457415333356115553    
Target0 80 AGCCTGGTCCATACCCCACCACCAACTTGCATAATGGGGGGTGATATCACCCACCCTCCAACCCCCTCACAGGAGCAGCT 159  Score

Quality     64262622453525151564411122523411313441222135151512336645223322222335642322213313    
Target0 160 GCTCTGGTGTGCTCGCCCAGGCCCATGGGGCAGAACCCTGGGAGGGGCAGTTTTGTCTAAAACTGTAACATTGGGGGGAC 239  Score

Quality     312321133354223223223564232122211352521112325225112232251152    
Target0 240 AGCAGGAGAAAATGAGAATATCTTAGGGCCCCTGACACGAGCCACCGATCAGGCGGAACA 299  Score

Quality   32322322274477762222522522631153121113332222332233177532244646653163111312111153   
Target1 0 GTTGGAATGAAGCCCCCAGCCTGGTCCAATCCCCCCCCCCAAGTTGTATATTGGGGGGTGATGTCACCCACCCTCCTCTC 79  Score

Quality    12742225532345536222211122562227777766261675664224275677455555632422777752276564    
Target1 80 CCCTCAAAGGGGCAGCTGCTCTGGGTGTCTTTCCCAGGCTCTGGGGGCGGACCCATGGGATGGGCTTTTTTGTACAAAGC 159  Score

                                      <J            CDR3><FR4                    FR4><C            
   Quality     77777674657777567777777662662267767777667646476777764647767767777774776776477762    
   Target1 160 TGTAACATTGTGGGGACAGAAGGCTACAAGGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAGAGGACCTGAA 239  Score
TRBJ2-1*00  23                        ctacaaTgagcagttcttcgggccagggacacggctcaccgtgctag           69   219
  TRBC2*00   0                                                                       aggacctgaa 9    222

 Quality     662777777776777774777677777652776747765477477777777777766666    
 Target1 240 AAACGTGTTCCCACCCGAGGTCGCTGTGTGTGAGCCATCAGAAGCAGAGATCTATGGGTT 299  Score
TRBC2*00  10 aaacgtgttcccacccgaggtcgctgtgtTtgagccatcagaagcagagatctCCCACAC 69   222

>>> Read id: 11

              <5'UTR                                                           5'UTR><L1          
   Quality    666664677777702466767  226 666767756266 2666556765652725244744627626266222524462    
   Target0  0 TACTCTCAAGCAGTTGTATCA--ACG-CAGAGTACGGGG-ACAGACACAGTGATGCCTGCCCCTTTGTGCCATGGGCTCC 75   Score
TRBV5-1*00 32 GaAtTt-a-gcTCtt-tCCcaGGaGgAcCAagCCcTgAgCacagacacagtgCtgcctgcccctttgtgccatgggctcc 108  190

                                                    L1><L2  L2><FR1                                
   Quality     24542452622442446623262462672677225222622232235524222225422222322222526222426262    
   Target0  76 CGGCTGCTATGTTGGGTGGTGCCTTGTCTCCTGTGAACAGGACCAGTAAAAGATTGAGTCGCTCAAACTCCCACATATCT 155  Score
TRBV5-1*00 109 AggctgctCtgttgggtgCtgcTttgtctcctgGgaGcaggCccagtaaaGgCtGgagtcActcaaactccAaGatatct 188  190

   Quality     56622566325522267732252522242222222243162424222222342221151346262241133365577711    
   Target0 156 GATCCAACAGAACAGACAGGCACTGGCACTCAGCCGCTACCATAACAATTGCCCACGGAGTTAACACCGGTAAACACAGC 235  Score
TRBV5-1*00 189 gatc                                                                             192  190

Quality     2244545115131311122223223522222323322211222332522222333622223555    
Target0 236 AACCACACCAAGGAGCACAGAGACACTTTCCATACACAAATGAGAACACAAAACACAAAACAAA 299  Score

                                              FR2><CDR2        CDR2><FR3                           
   Quality     322222225332222223122176522223224523223 3235226547765224453235223231131133123232    
   Target1   0 GCTCTCAGTACATGGCCTTTTTTTCCGCTTTGAAGTAGT-CAGTGTGACACAGAGAAACTAAGTCAACTTCCCTTTTCGA 78   Score
TRBV5-1*00 267 gACcCcagGacaGggccttCAGttccTctttgaa-taCtTcagtgAgacacagagaaacAaagGAaacttccctGGtcga 345  385

                                                                                            FR3    
   Quality     32341113155245656556613353145632552676642256526421541225456127526672255476276272    
   Target1  79 TTCTTCTGCCGCCAGTTCTCTAACTCGCTCTCTGAGATGAATGTGAGCACCTGGGAGCTGGGGTACTCCGCCCTGTAGCT 158  Score
TRBV5-1*00 346 ttctCAGgGcgccagttctctaactcTcGctctgagatgaatgtgagcacctTggagctggggGactcGgccctTtaTct 425  385

                               DP><D                                                               
               ><CDR3         V><DPD>      <J        CDR3><FR4                    FR4><C           
   Quality     77555777764752655646776362646562747577577777642777777777777566767762777664776776    
   Target1 159 TTGCGCCAGCAGCGTGGCCGGGGGGTGGACTTAAGCTTTCTTTGGACAAGGCACCAGACTCACAGTTGTAGAGGACCTGA 238  Score
TRBV5-1*00 426 ttgcgccagcagcTtgg                                                                442  385
  TRBD1*00  10                  ccggg                                                           14   25
  TRBD2*00  14                  ccggg                                                           18   25
TRBJ1-1*00  25                             actGaagctttctttggacaaggcaccagactcacagttgtag          67   199
  TRBC1*00   0                                                                        aggacctga 8    254
  TRBC2*00   0                                                                        aggacctga 8    222

 Quality     7647777564427642676777777777776762767775677477774777777766666    
 Target1 239 ACAAGGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTTAAAAGT 299  Score
TRBC1*00   9 acaaggtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatctCCCaCAC 69   254
TRBC2*00   9 aAaaCgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatctCCCaCAC 69   222

>>> Read id: 12

Quality   22222422577770567777245676775776677752224225242542522246477757242222424242224525   
Target0 0 ATCGCGAAAGCAGAGGTATCAACGCAGAGTACGGGAAAAACTCCCCTGAAGAGTCCATGGGCCCGAACCTACTTACCTGT 79  Score

Quality    24222462425222256465242222422253245222243624222222262222224326253232265222222224    
Target0 80 GTGCTCCTGTGCCTTCTGGGTTCGGGCCACCAGGAGCAGCAATTTTGCAAAAAGATCGGATACCCGACCACTATGTATGG 159  Score

Quality     22252322245241413122256225262224221131113222224221111433232411111111511311222223    
Target0 160 CAAGTCGTCAACCGTCAAATGTTCTCTGGATCCGACACCATCAGACACAGCCAGCTATCGACCAAACACAGGACTAGGAT 239  Score

Quality     222233222222231111357741311233232211211222322242223223322233    
Target0 240 TAAGGAATACACCAAACACAACGAATAGTGCAAGGCCAGATAAGGAACAAGTCAATGAAA 299  Score

                                            FR2><CDR2        CDR2><FR3                            
  Quality     53111112522654745455332245312255452222322165253226212222253233222257461313222323    
  Target1   0 TTTTCCCGGGTTTGGCTTAAGGCTTATCGTCTATTCAAGGAATGTTGATGTTACGGATAAGGTGTATGTTCCGGATGGGT 79   Score
TRBV27*00 267 AGACccAgggCtGggcttaaggcAGatcTActattcaaTgaatgttgaGgtGacTgataaggGAGatgttccTgaAgggt 346  336

                                                                                          FR3>    
  Quality     22322211131532324522222242223133122222325231516432131162265262225565422677577422    
  Target1  80 ATAAATGCTCTCGCAAAGAGAAAAGGAATTTCCCCCTGATCCAGGTGTCGCCCAGCCCCTACCGTACCTTCCTGTACTCC 159  Score
TRBV27*00 347 aCaaaGTctctcgAaaagagaaGaggaatttccccctgatccTggAgtcgcccagccccAaccAGacctCTctgtactTc 426  336

               <CDR3         V>  <D D>           <J CDR3><FR4                    FR4><C            
   Quality     64522226326222542526426224254525522545252762622452647777744552245427777777777777    
   Target1 160 TGTGCCAGCGGTTTATACGGGGGTTGGCGTGGGAAGTTCTGCGGGCCAGGGACACGGCTCACCGTGCTAGAGGACCTGAA 239  Score
 TRBV27*00 427 tgtgccagcAgtttat                                                                 442  336
  TRBD1*00  18                   ggggg                                                          22   25
TRBJ2-1*00  34                                   agttctTcgggccagggacacggctcaccgtgctag           69   164
  TRBC2*00   0                                                                       aggacctgaa 9    270
  TRBC1*00   0                                                                       aggacctgaa 9    238

 Quality     777777776756577777777777777777776747777677764577777777766466    
 Target1 240 AAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTAGTAAAA 299  Score
TRBC2*00  10 aaacgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatctCCCaCaC 69   270
TRBC1*00  10 CaaGgtgttcccacccgaggtcgctgtgtttgagccatcagaagcagagatctCCCaCaC 69   238

Filtered: 11 / 11 = 100.0%
PoslavskySV commented 7 years ago

As I see there are two reasons why number of reads used in clonotypes is so small:

  1. Reads dropped due to the lack of a clone sequence: 103148 (34.35%) -- that means that 34.35% of succesfully aligned reads do not cover CDR3 region; nothing can be done about this
  2. Reads dropped due to failed mapping: 93315 (31.08%) -- that means that 31.08% of succesfully aligned reads have not quite good sequence quality (but not radically bad) --- MiXCR tries to recover such low quality reads as described here --- mapping was unsuccessful. You can try to use lower quality thresholds like that (so MiXCR will consider even low quality reads as "good enough"):
mixcr assemble --report clonReport.log -ObadQualityThreshold=10 -OmaxBadPointsPercent=0.9 alignment.vdjca alignment.clns
suntaosimon commented 7 years ago

Thanks PoslavskySV!

BTW, what's the algorithm for mapping CDR3? The reason I am asking is that we recently found something new in our data. To best of my knowledge, there may be some random nucleotides inserted between V-D and D-J of TCRb. But in our data, we also noticed that, in most of the sequences, there are nucleotides mutations on the end of V, beginning of J and sometimes both side of D (humanly compared with IMGT database), which means there are mutations on the VDJ segments that belong to CDR3 region.

I don't know if this finding is true or due to our analysis error so I would like to know how mixcr dose the VDJ mapping. If it first map the CDR3 region, and there are mutations on those regions I don't understand how could the CDR3 be mapped. If Mixcr first map beginning of V or end of J, there might be possible to get theses mutations.

mikessh commented 7 years ago

Dear Sun,

BTW, what's the algorithm for mapping CDR3? You mean the V/D/J mapping inside CDR3? There is a pass of local alignment that refines V/D/J boundaries.

Random nucleotide insertion at VD and DJ junctions is one of fundamental steps of VDJ rearrangement (there is a huge body of relevant literature on this).

there are nucleotides mutations on the end of V, beginning of J and sometimes both side of D (humanly compared with IMGT database)

These are most likely PCR/sequencing errors, or some unknown alleles. To claim V/J mutations similar to antibody hypermutations one needs to carefully correct for these errors using a dedicated library prep protocol and compare DNA/RNA.

I don't know if this finding is true or due to our analysis error so I would like to know how mixcr dose the VDJ mapping.

This is an extremely broad question - please refer to the Mixcr paper.

If it first map the CDR3 region, and there are mutations on those regions I don't understand how could the CDR3 be mapped.

This one is unclear

If Mixcr first map beginning of V or end of J, there might be possible to get theses mutations.

Mixcr maps both V and J, finds the reference points (conserved Cys/Phe) and tags the sequence between them as CDR3. There is a trade-off between false positive mapping and a more conservative one in any local alignment algorithm:

These problems are intrinsic to any deterministic alignment algorithm and can be solved with a more complex (and extremely slow) probabilistic VDJ rearrangement modeling (see works of Walczak et al.)