Closed fabio-t closed 8 years ago
Add --library local
option to activate imported library of segments (your current command still uses built-it library, where mm
is not defined, but musmusculus
is).
Let me know whether it worked.
P.S. IMGT library for mouse TRA / TRD is badly formatted which results to wrong boundaries of CDR3. I'am currently releasing a version with fix for this problem (1.7.1). Hope it will be done in two-three hours.
Works great, thanks!
Although might I suggest some input validation on that option? Might be better to have an error telling me "species name unrecognised" or something similar than a NullPointerException :)
I will be eagerly waiting for the fix on mouse TRA; thanks for letting me know!
You are totally right, I will add human-readable error message here! Thanks for suggestion.
Related to #70.
Just finished: https://github.com/milaboratory/mixcr/releases/tag/v1.7.1
Please let me know if there will be any problems with it.
Doesn't seem to be working, the sequences have a wrong left boundary:
TACTACTGTGCTTTGAGTAGCAACTATCAGTTGATCTGG YYCALSSNYQLIW
I updated with brew, and it shows there version 1.7-2. However, mixcr version seems wrong:
$ mixcr --version
MiXCR v1.7 (built Tue Dec 29 10:58:03 CET 2015; rev=87f1b07; branch=release/v1.7)
Components:
MiLib v1.2 (rev=4f56782; branch=release/v1.2)
MiTools v1.2 (rev=eb91603; branch=release/v1.2)
Might be that 1.7.1 didn't build properly?
Yap, thanks! I also just discovered it. Turned out that "1.7.1-1" < "1.7-2"
for homebrew. Please, try to update mixcr one more time brew update && brew upgrade mixcr
.
Also, to correct wrong boundaries you should reimport segments from IMGT.
The boundary works fine.. mostly :) I got a few strange sequences that I'm not sure make sense:
1 GAAGTACAGGGCAGAGTCTGACAGCTGCGCTGATCAGCGCAGCTGTGATACGCGTTTTCGTGCTTCTTTTGTGTGTGGTTGGGGGTGTGGGGCACTGGGCACCTTTTTTTTGGATGTGAAACACAGTTGACAGTCATCCCAAACATCCAGACCCCCGAACCTGCTGTGTACCAGTTAAAAGATCCTCGGTCTCAGG
ACATCCCCCTCTACTCGTCTGAACTCCAGTCACTTTG EVQGRV*QLR*SAQL*YAFSCFFCVWLGVWGTGHLFFGC_NTVDSHPKHPDPRTCCVPVKRSSVSGHPPLLV*TPVTL
1 TGTGCTTTGAGTGAATACTTCCAGAACCCAGAACATGCTATGTACCAGGTAAAAGATCCTCGGCATCAGGCCAACACCCTCTACACACTTCCACTCCAGTCACATCACGCTCACGTTTGCCGTCTTCTACGCAACCGCCACCAGTTAAATAGCTTGCAAATTACGTGGCCTTATGGCTACAGTATGCCCATCACAA
TTAGCAACAATCACGACACTTT CALSEYFQNPEHAMYQVKDPRHQANTLYTLPLQSHH_SRLPSSTQPPPVK*LANYVALWLQYAHHN*QQSRHF
1 TGTGCTTTGAGGGACGGGCAAGACCGGTAAACTGAGGTCTTGGCTGGTGAGAACCGATCGATGGCAACCACTCATCTAGAACACCCAGAATCCGCGGTGCCCGAGAAAAAATCTCCGCGGTCACAGCACCGCACCCTACACGCGTCTGCTCCCGTCACACTACCTTATCGCGTGCGCCCTCTTCCGTTTAAAAAAA
AAAAACCTACAAAACTCTCTCT CALRDGQDR*TEVLAGENRSMATTHLEHPESAVPEK_ISAVTAPHPTRVCSRHTTLSRAPSSV*KKKTYKTLS
1 TGTGCTCTGAGTCCGGACTTGAATATATGAAACTGCCTTTGCAAGTTTGACTCAGGTGCTAGACTGCACTGACATCCACAACCCAAGAACTCCGAAGTACCAGTTAACCGGTCCACAGATTCTCGGCGGCACCCTCTCCCCCTCTACACGCCTGAAACATAGCCACTTCCCATTCCCGCTTCCCCTTTTCTCCTAA
AAAAAAAAAAACATAACCTCT CALSPDLNI*NCLCKFDSGARLH*HPQPKNSEVPVN_GPQILGGTLSPSTRLKHSHFPFPLPLFS*KKKNITS
1 TGTGCTCTGAGTGCTAGCAAGATGGGCTATAAAGTTACTGTAGGGTGAAGCACAAACTTGGTGGTTGATCCAACCATCCAGCCCCCAGACCCTACTGTGTACGAGTTAGATGATAGTCCTTCTCATGCCGGCCCCCTCTACCTGTCTGAATTCCAGTCACATCACGCTCGTTTTTGTCGGCTTATGCTTGCATCAA
AAAATAAACTTTAACCTTTG CALSASKMGYKVTVG*STNLVVDPTIQPPDPTVYELDDSPSHAGPLYLSEFQSHHARFCRLMLASKNKL*PL
1 TGTCTCTCATCACGATCTCGTATGCCGTCTTCTGCTTGAAATAAAAAACACCAAAAATAGCAACGACACCAAAAGCAGAATAACCAACAACATCAGAAATCTAGTTAGAAGAACATCCAAAATAAGAAAAGACACACCACTAATCCTGAGTACGCTGATAAAGTAATAAAATCTCACGACTACTGCTACAATAAAA
AACTAAAAATAGGACATT CLSSRSRMPSSA*NKKHQK*QRHQKQNNQQHQKSS*_RTSKIRKDTPLILSTLIK**NLTTTATIKN*K*DI
1 TGTGCTTTGAGACATCCAGAACCCAGAACCTGCTGTGTACCAGTTAAAAGATCCTCGGTCTCAGGACAGCACCCTCTACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAGTATCATCCAATACTTACTACATCTTCTATCAAAATATGTTTACTATCACTTCC
CT CALRHPEPRTCCVPVKRSSVSGQHPLHV*TPVTSRSRMPSSA*KKKKSIIQYLLHLLSKYVYYHFP
1 TGTGCTTTGAGACATCCAGAACCCAGAACCTGCTGTGTACCAGTTAAAAGATCCTCGGTCTCAGGACAGCACCCTCTACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAACCCAAAACAATATTCCATACTCTAGTTCTCTCAACTCAAAAAAAAATCTATAAAAG
CALRHPEPRTCCVPVKRSSVSGQHPLHV*TPV_ITISYAVFCLKKKTQNNIPYSSSLNSKKNL*K
1 GTAGTACAGGGCAGAGTCTGACAGCTGCGCTGATCAGCGCAGCTGTCAGACTCTGCCCTGGTCTTCTATCCTCTCAGCCCTCCTGACCCCATACCCTGCTTTACAACATTTATCAGATCCTCGGTCTCATGACAGCCCCCTCTACACTTCTGAACACCATTCACTTTG VVQGRV*QLR*SAQLSDSAL
VFYPLSPPDPIPCFTTFIRSSVS*QPPLHF*TPFTL
and also a few (haven't counted them, but not many over the total) that don't respect the boundaries:
2 TATGCTTTGAGGATGTCTAATTACAACGTGCTTTACCTC YALRMSNYNVLYL
2 CGTGCTTTGAGTGAGATTATGCCCAGGGATCAACCTTC RALSEI_AQGSTF
Maybe it's worth to open another issue? Or am I doing something wrong?
mixcr align -f --library local -l TRA -s mm r1.fastq r2.fastq alignments.vdjca
mixcr assemble -f -ObadQualityThreshold=10 alignments.vdjca clones.clns
mixcr exportClones -count -sequence -aaFeature CDR3 -f clones.clns clones.txt
Yes, it's worth to open a dedicated issue for this.
Can you please describe the type of sample you are analysing.
-p rna-seq
on alignment stage), they are specifically optimised to achieve zero false-positive rate, and actually even have slightly better yield (they are just not so universal as default params).Cys
/ ending with Phe
, they don't look like wrong CDR3 boundaries, but more like PCR or sequencing (as you lowered the quality threshold) errors in corresponding triplets. In the light of this second issue, can you please also describe in some details your sample preparation protocol, because both issues could arise from some artefacts on this stage (I've seen lots of very complex sample preparation issues).Thanks for the reply, I will try the rna-seq specific parameter. I cannot provide right the sample preparation (not done by our lab) but I might come back here in the future with a new issue.
For now I'll be using MIXCOR with all my data. Thanks for this great software.
From within the mixcr installation directory, I ran importFromIMGT.sh and asked for "Mus Musculus" data, with names "mm:musmusculus".
When I try to run the alignment using -s mm, it crashes as below. If I use, instead, musmusculus, it works fine.