milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
335 stars 79 forks source link

UMI with Nanopore Data #1614

Closed bshim181 closed 7 months ago

bshim181 commented 7 months ago

Hello, I had a quite disparity in number of clonotypes assembled before and after accounting for read UMIs. I was wondering what might be the reason behind this phenomenon.

This is the assemble report when executed without read UMI consideration.

Analysis time: 21.1s Final clonotype count: 1734 Reads used in clonotypes, percent of total: 1565874 (72.29%) Average number of reads per clonotype: 903.04 Reads dropped due to the lack of a clone sequence, percent of total: 39103 (1.81%) Reads dropped due to a too short clonal sequence, percent of total: 2 (0%) Reads dropped due to low quality, percent of total: 0 (0%) Reads dropped due to failed mapping, percent of total: 0 (0%) Reads dropped with low quality clones, percent of total: 113001 (5.22%) Aligned reads processed: 1878052 Reads used in clonotypes before clustering, percent of total: 1765049 (81.48%) Number of reads used as a core, percent of used: 1765049 (100%) Mapped low quality reads, percent of used: 0 (0%) Reads clustered in PCR error correction, percent of used: 199175 (11.28%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Clonotypes dropped as low quality: 99045 Clonotypes eliminated by PCR error correction: 6848 Clonotypes pre-clustered due to the similar VJC-lists: 0 Clones dropped in post filtering: 0 (0%) Reads dropped in post filtering: 0.0 (0%) Alignments filtered by tag prefix: 0 (0%) IGH chains: 937 (54.04%) IGH non-functional: 638 (68.09%) IGK chains: 787 (45.39%) IGK non-functional: 522 (66.33%) IGL chains: 10 (0.58%) IGL non-functional: 0 (0%)

This is the assemble report when I considered a 12 bp UMI at the 5' end. It seems like there is almost 98% concentration of UMIs to a single clonotype. What might be the reasonable explanation to why this occurs? It seems like there is not that many alignments discarded, so I was wondering why all the other clones previously detected disappeared.

Analysis time: 12m Final clonotype count: 87 Reads used in clonotypes, percent of total: 1715080 (79.18%) Average number of reads per clonotype: 19713.56 Reads dropped due to the lack of a clone sequence, percent of total: 35531 (1.64%) Reads dropped due to a too short clonal sequence, percent of total: 0 (0%) Reads dropped due to low quality, percent of total: 0 (0%) Reads dropped due to failed mapping, percent of total: 0 (0%) Reads dropped with low quality clones, percent of total: 505 (0.02%) Aligned reads processed: 1720564 Reads used in clonotypes before clustering, percent of total: 1720059 (79.41%) Number of reads used as a core, percent of used: 1720059 (100%) Mapped low quality reads, percent of used: 0 (0%) Reads clustered in PCR error correction, percent of used: 4979 (0.29%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Clonotypes dropped as low quality: 234 Clonotypes eliminated by PCR error correction: 236 Clonotypes pre-clustered due to the similar VJC-lists: 0 Clones dropped in post filtering: 0 (0%) Reads dropped in post filtering: 0.0 (0%) Alignments filtered by tag prefix: 4978 (0.23%) IGH chains: 47 (54.02%) IGH non-functional: 23 (48.94%) IGK chains: 35 (40.23%) IGK non-functional: 15 (42.86%) IGL chains: 5 (5.75%) IGL non-functional: 0 (0%) Pre-clone assembler report: Number of input groups: 93301 Number of input groups with no assembling feature: 12 Number of input alignments: 1805673 Number of alignments with assembling feature: 1770142 (98.03%) Number of output pre-clones: 101943 Number of pre-clonotypes per group:
0: + 744 (0.8%) = 744 (0.8%) 1: + 83175 (89.16%) = 83919 (89.96%) 2: + 9342 (10.01%) = 93261 (99.97%) 3: + 28 (0.03%) = 93289 (100%) Number of assembling feature sequences in groups with zero pre-clonotypes: 2623 Number of dropped pre-clones by tag suffix conflict: 0 Number of dropped alignments by tag suffix conflict: 0 Number of core alignments: 1702472 (94.28%) Discarded core alignments: 67670 (3.97%) Empirically assigned alignments: 18092 (1%) Empirical assignment conflicts: 0 (0%) Tag+VJ-gene empirically assigned alignments: 18092 (1%) VJ-gene empirically assigned alignments: 0 (0%) Tag empirically assigned alignments: 0 (0%) Number of ambiguous groups: 9370 Number of ambiguous V-genes: 38 Number of ambiguous J-genes: 19 Number of ambiguous tag+V/J-gene combinations: 57 Ignored non-productive alignments: 0 (0%) Unassigned alignments: 85043 (4.71%)

mizraelson commented 7 months ago

Hi, so the report looks fine, and most likely the results you get are accurate. UMIs help to reduce artificial diversity by correcting the errors, so it is only logical that you see a lower number of clones. My guess is the top clonotypes are the same, and the ones that were correct are the singletons.