milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
323 stars 78 forks source link

D assignment #80

Closed swuecho closed 8 years ago

swuecho commented 8 years ago

I got this result from one of my analysis.

cmd option mixcr exportClones -vHit -jHit -dHit -count -aaFeature CDR3 in mixcr version 1.6

IGHV4-59*00 IGHJ4*00    IGHD6-19*00 87  CAGGSGLPYW
IGHV4-59*00 IGHJ4*00    IGHD2-15*00 84  CAGGSGLPYW

in mixcr version 1.7.2

IGHV4-59*00 IGHJ4*00    IGHD6-19*00 88  CAGGSGLPYW          
IGHV4-59*00 IGHJ4*00    IGHD3-22*00 85  CAGGSGLPYW

seems D assignment is not that accurate?

Thanks.

dbolotin commented 8 years ago

Could you post nucleotide sequences of corresponding clonotypes, along with alignment field and D alignment scores. Basically you can just copy corresponding lines from default export output of either alignments or assembled clonotypes.

swuecho commented 8 years ago

align

assemble

above result is generated by

/home/hwu/app/mixcr-1.7.2/mixcr align --loci IGH seq.fastq seq.vdjca
/home/hwu/app/mixcr-1.7.2/mixcr assemble seq.vdjca seq.clns
/home/hwu/app/mixcr-1.7.2/mixcr exportAlignments seq.vdjca alignments.txt
/home/hwu/app/mixcr-1.7.2/mixcr exportClones  seq.clns clones.txt
perl -n -E 'print if /CAGGSGLPYW/' clones.txt >clones_debug.txt
perl -n -E 'print if /CAGGSGLPYW/' alignments.txt > alignment_debug.txt

did not try mixcr 1.6, since I saw new version 1.7.2

dbolotin commented 8 years ago

I don't see any problems with these two clones:

There are two clones with different nucleotide sequences, and with two different D genes that most likely be used during their rearrangement:

Clone1:
    TGTGCGGGAGGCAGTGGTCTCCCCTACTGG
       gggtataGCAGTGGctggtac         - IGHD6-19

Clone2:
    TGTGCGGGAGGTAGTGGTCTCCCCTACTGG
gtattactatgataGTAGTGGTtattactac      - IGHD3-22

So alignments are ok, and D genes were assigned as good as possible. However, as there are many similar D genes for IGH, it is practically impossible to tell for sure which D gene was used in each particular rearrangement. Situation is moreover complicated by hypermutations (which seems to be the case here).

If D genes are of particular interest in your research, you should adopt some approach like one described by Aleksandra Walczak and collegues in the following papers: http://www.pnas.org/content/109/40/16161.short http://rstb.royalsocietypublishing.org/content/370/1676/20140243.abstract

swuecho commented 8 years ago

Thanks!