Output - Githubissues

milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.

https://mixcr.com

Other

316 stars 78 forks source link

Output #1261

Closed bshim181 closed 11 months ago

bshim181 commented 12 months ago

Hello, when looking at the mixcr report, I find two TCR repertoires with exactly identical CDR3 amino acid sequences and 1 mismatch in CDR3 nt sequences. These two occurrences have equal V and J segments.

I was wondering why they have not been clustered into a single clonotype when I have specified assemble gene feature parameter to CDR3 and 2 mismatches or 1 indels are allowed.

mizraelson commented 12 months ago

Hi, Could you please provide the exact command you used?

To address your question, we should initially examine the alignments that support each clone. This can be done using the mixcr exportAlignments input.vdjca output.tsv command. It's likely that there was a comparable number of reads supporting each variant, cause clustering only happens if one clone has a significantly lower count.

Sincerely, Mark

bshim181 commented 12 months ago

I basically used generic-amplicon preset set-ups and made only changes during exportClone parameters (I believe generic-amplicon by default clusters by CDR3 sequences with max mutations allowed being 2 nt mismatches or 1 indels but not both).

bshim181 commented 12 months ago

I was wondering since these clonotypes have identical CDR3 seq (with 1 difference in nt) and also identical V and J segments, does it make sense to combine or cluster these clonotypes together?

The read count comparison is 284 vs 5. What is the threshold for valid number of reads that support each variant?

mizraelson commented 12 months ago

The determination process is somewhat more complex than simply examining the number of reads. Initially, the sequencing quality of the nucleotide is considered. Subsequently, a negative binomial distribution is applied to ascertain if this nucleotide might be the result of an error, accounting for both the read proportion of the clones and the quality of the sequencing.