milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
326 stars 79 forks source link

Crash when using "mm" as species name #69

Closed fabio-t closed 8 years ago

fabio-t commented 8 years ago

From within the mixcr installation directory, I ran importFromIMGT.sh and asked for "Mus Musculus" data, with names "mm:musmusculus".

When I try to run the alignment using -s mm, it crashes as below. If I use, instead, musmusculus, it works fine.

mixcr align -r log_align.txt -l TRA -s mm 150814_TCR_Valpha8_MID10_S7_L001_R1_001.fastq 150814_TCR_Valpha8_MID10_S7_L001_R2_001.fastq alignments.vdjca -f
Alignment: 3.3%
Exception in thread "main" java.lang.NullPointerException
    at com.milaboratory.mixcr.vdjaligners.VDJCAlignerPVFirst.createInitialHelper(VDJCAlignerPVFirst.java:127)
    at com.milaboratory.mixcr.vdjaligners.VDJCAlignerPVFirst.createInitialHelpers(VDJCAlignerPVFirst.java:122)
    at com.milaboratory.mixcr.vdjaligners.VDJCAlignerPVFirst.process(VDJCAlignerPVFirst.java:58)
    at com.milaboratory.mixcr.vdjaligners.VDJCAlignerWithMerge.process(VDJCAlignerWithMerge.java:86)
    at com.milaboratory.mixcr.vdjaligners.VDJCAlignerWithMerge.process(VDJCAlignerWithMerge.java:42)
    at cc.redberry.pipe.CUtils$9.process(CUtils.java:350)
    at cc.redberry.pipe.CUtils$9.process(CUtils.java:345)
    at cc.redberry.pipe.blocks.ParallelProcessor$Worker.run(ParallelProcessor.java:304)
    at java.lang.Thread.run(Thread.java:745)
dbolotin commented 8 years ago

Add --library local option to activate imported library of segments (your current command still uses built-it library, where mm is not defined, but musmusculus is).

Let me know whether it worked.

P.S. IMGT library for mouse TRA / TRD is badly formatted which results to wrong boundaries of CDR3. I'am currently releasing a version with fix for this problem (1.7.1). Hope it will be done in two-three hours.

fabio-t commented 8 years ago

Works great, thanks!

Although might I suggest some input validation on that option? Might be better to have an error telling me "species name unrecognised" or something similar than a NullPointerException :)

I will be eagerly waiting for the fix on mouse TRA; thanks for letting me know!

dbolotin commented 8 years ago

You are totally right, I will add human-readable error message here! Thanks for suggestion.

dbolotin commented 8 years ago

Related to #70.

dbolotin commented 8 years ago

Just finished: https://github.com/milaboratory/mixcr/releases/tag/v1.7.1

Please let me know if there will be any problems with it.

fabio-t commented 8 years ago

Doesn't seem to be working, the sequences have a wrong left boundary:

TACTACTGTGCTTTGAGTAGCAACTATCAGTTGATCTGG YYCALSSNYQLIW

I updated with brew, and it shows there version 1.7-2. However, mixcr version seems wrong:

$ mixcr --version
MiXCR v1.7 (built Tue Dec 29 10:58:03 CET 2015; rev=87f1b07; branch=release/v1.7)
Components: 
MiLib v1.2 (rev=4f56782; branch=release/v1.2)
MiTools v1.2 (rev=eb91603; branch=release/v1.2)

Might be that 1.7.1 didn't build properly?

dbolotin commented 8 years ago

Yap, thanks! I also just discovered it. Turned out that "1.7.1-1" < "1.7-2" for homebrew. Please, try to update mixcr one more time brew update && brew upgrade mixcr.

Also, to correct wrong boundaries you should reimport segments from IMGT.

fabio-t commented 8 years ago

The boundary works fine.. mostly :) I got a few strange sequences that I'm not sure make sense:

1       GAAGTACAGGGCAGAGTCTGACAGCTGCGCTGATCAGCGCAGCTGTGATACGCGTTTTCGTGCTTCTTTTGTGTGTGGTTGGGGGTGTGGGGCACTGGGCACCTTTTTTTTGGATGTGAAACACAGTTGACAGTCATCCCAAACATCCAGACCCCCGAACCTGCTGTGTACCAGTTAAAAGATCCTCGGTCTCAGG
ACATCCCCCTCTACTCGTCTGAACTCCAGTCACTTTG       EVQGRV*QLR*SAQL*YAFSCFFCVWLGVWGTGHLFFGC_NTVDSHPKHPDPRTCCVPVKRSSVSGHPPLLV*TPVTL
1       TGTGCTTTGAGTGAATACTTCCAGAACCCAGAACATGCTATGTACCAGGTAAAAGATCCTCGGCATCAGGCCAACACCCTCTACACACTTCCACTCCAGTCACATCACGCTCACGTTTGCCGTCTTCTACGCAACCGCCACCAGTTAAATAGCTTGCAAATTACGTGGCCTTATGGCTACAGTATGCCCATCACAA
TTAGCAACAATCACGACACTTT      CALSEYFQNPEHAMYQVKDPRHQANTLYTLPLQSHH_SRLPSSTQPPPVK*LANYVALWLQYAHHN*QQSRHF
1       TGTGCTTTGAGGGACGGGCAAGACCGGTAAACTGAGGTCTTGGCTGGTGAGAACCGATCGATGGCAACCACTCATCTAGAACACCCAGAATCCGCGGTGCCCGAGAAAAAATCTCCGCGGTCACAGCACCGCACCCTACACGCGTCTGCTCCCGTCACACTACCTTATCGCGTGCGCCCTCTTCCGTTTAAAAAAA
AAAAACCTACAAAACTCTCTCT      CALRDGQDR*TEVLAGENRSMATTHLEHPESAVPEK_ISAVTAPHPTRVCSRHTTLSRAPSSV*KKKTYKTLS
1       TGTGCTCTGAGTCCGGACTTGAATATATGAAACTGCCTTTGCAAGTTTGACTCAGGTGCTAGACTGCACTGACATCCACAACCCAAGAACTCCGAAGTACCAGTTAACCGGTCCACAGATTCTCGGCGGCACCCTCTCCCCCTCTACACGCCTGAAACATAGCCACTTCCCATTCCCGCTTCCCCTTTTCTCCTAA
AAAAAAAAAAACATAACCTCT       CALSPDLNI*NCLCKFDSGARLH*HPQPKNSEVPVN_GPQILGGTLSPSTRLKHSHFPFPLPLFS*KKKNITS
1       TGTGCTCTGAGTGCTAGCAAGATGGGCTATAAAGTTACTGTAGGGTGAAGCACAAACTTGGTGGTTGATCCAACCATCCAGCCCCCAGACCCTACTGTGTACGAGTTAGATGATAGTCCTTCTCATGCCGGCCCCCTCTACCTGTCTGAATTCCAGTCACATCACGCTCGTTTTTGTCGGCTTATGCTTGCATCAA
AAAATAAACTTTAACCTTTG        CALSASKMGYKVTVG*STNLVVDPTIQPPDPTVYELDDSPSHAGPLYLSEFQSHHARFCRLMLASKNKL*PL
1       TGTCTCTCATCACGATCTCGTATGCCGTCTTCTGCTTGAAATAAAAAACACCAAAAATAGCAACGACACCAAAAGCAGAATAACCAACAACATCAGAAATCTAGTTAGAAGAACATCCAAAATAAGAAAAGACACACCACTAATCCTGAGTACGCTGATAAAGTAATAAAATCTCACGACTACTGCTACAATAAAA
AACTAAAAATAGGACATT  CLSSRSRMPSSA*NKKHQK*QRHQKQNNQQHQKSS*_RTSKIRKDTPLILSTLIK**NLTTTATIKN*K*DI
1       TGTGCTTTGAGACATCCAGAACCCAGAACCTGCTGTGTACCAGTTAAAAGATCCTCGGTCTCAGGACAGCACCCTCTACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAGTATCATCCAATACTTACTACATCTTCTATCAAAATATGTTTACTATCACTTCC
CT  CALRHPEPRTCCVPVKRSSVSGQHPLHV*TPVTSRSRMPSSA*KKKKSIIQYLLHLLSKYVYYHFP
1       TGTGCTTTGAGACATCCAGAACCCAGAACCTGCTGTGTACCAGTTAAAAGATCCTCGGTCTCAGGACAGCACCCTCTACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAACCCAAAACAATATTCCATACTCTAGTTCTCTCAACTCAAAAAAAAATCTATAAAAG
      CALRHPEPRTCCVPVKRSSVSGQHPLHV*TPV_ITISYAVFCLKKKTQNNIPYSSSLNSKKNL*K
1       GTAGTACAGGGCAGAGTCTGACAGCTGCGCTGATCAGCGCAGCTGTCAGACTCTGCCCTGGTCTTCTATCCTCTCAGCCCTCCTGACCCCATACCCTGCTTTACAACATTTATCAGATCCTCGGTCTCATGACAGCCCCCTCTACACTTCTGAACACCATTCACTTTG        VVQGRV*QLR*SAQLSDSAL
VFYPLSPPDPIPCFTTFIRSSVS*QPPLHF*TPFTL

and also a few (haven't counted them, but not many over the total) that don't respect the boundaries:

2       TATGCTTTGAGGATGTCTAATTACAACGTGCTTTACCTC YALRMSNYNVLYL
2       CGTGCTTTGAGTGAGATTATGCCCAGGGATCAACCTTC  RALSEI_AQGSTF

Maybe it's worth to open another issue? Or am I doing something wrong?

mixcr align -f --library local -l TRA -s mm r1.fastq r2.fastq  alignments.vdjca
mixcr assemble -f -ObadQualityThreshold=10 alignments.vdjca clones.clns
mixcr exportClones -count -sequence -aaFeature CDR3 -f clones.clns clones.txt
dbolotin commented 8 years ago

Yes, it's worth to open a dedicated issue for this.

Can you please describe the type of sample you are analysing.

fabio-t commented 8 years ago

Thanks for the reply, I will try the rna-seq specific parameter. I cannot provide right the sample preparation (not done by our lab) but I might come back here in the future with a new issue.

For now I'll be using MIXCOR with all my data. Thanks for this great software.