UMI with Nanopore Data - Githubissues

Hello, I had a quite disparity in number of clonotypes assembled before and after accounting for read UMIs. I was wondering what might be the reason behind this phenomenon.

This is the assemble report when executed without read UMI consideration.

Analysis time: 21.1s Final clonotype count: 1734 Reads used in clonotypes, percent of total: 1565874 (72.29%) Average number of reads per clonotype: 903.04 Reads dropped due to the lack of a clone sequence, percent of total: 39103 (1.81%) Reads dropped due to a too short clonal sequence, percent of total: 2 (0%) Reads dropped due to low quality, percent of total: 0 (0%) Reads dropped due to failed mapping, percent of total: 0 (0%) Reads dropped with low quality clones, percent of total: 113001 (5.22%) Aligned reads processed: 1878052 Reads used in clonotypes before clustering, percent of total: 1765049 (81.48%) Number of reads used as a core, percent of used: 1765049 (100%) Mapped low quality reads, percent of used: 0 (0%) Reads clustered in PCR error correction, percent of used: 199175 (11.28%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Clonotypes dropped as low quality: 99045 Clonotypes eliminated by PCR error correction: 6848 Clonotypes pre-clustered due to the similar VJC-lists: 0 Clones dropped in post filtering: 0 (0%) Reads dropped in post filtering: 0.0 (0%) Alignments filtered by tag prefix: 0 (0%) IGH chains: 937 (54.04%) IGH non-functional: 638 (68.09%) IGK chains: 787 (45.39%) IGK non-functional: 522 (66.33%) IGL chains: 10 (0.58%) IGL non-functional: 0 (0%)

This is the assemble report when I considered a 12 bp UMI at the 5' end. It seems like there is almost 98% concentration of UMIs to a single clonotype. What might be the reasonable explanation to why this occurs? It seems like there is not that many alignments discarded, so I was wondering why all the other clones previously detected disappeared.

Analysis time: 12m Final clonotype count: 87 Reads used in clonotypes, percent of total: 1715080 (79.18%) Average number of reads per clonotype: 19713.56 Reads dropped due to the lack of a clone sequence, percent of total: 35531 (1.64%) Reads dropped due to a too short clonal sequence, percent of total: 0 (0%) Reads dropped due to low quality, percent of total: 0 (0%) Reads dropped due to failed mapping, percent of total: 0 (0%) Reads dropped with low quality clones, percent of total: 505 (0.02%) Aligned reads processed: 1720564 Reads used in clonotypes before clustering, percent of total: 1720059 (79.41%) Number of reads used as a core, percent of used: 1720059 (100%) Mapped low quality reads, percent of used: 0 (0%) Reads clustered in PCR error correction, percent of used: 4979 (0.29%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Clonotypes dropped as low quality: 234 Clonotypes eliminated by PCR error correction: 236 Clonotypes pre-clustered due to the similar VJC-lists: 0 Clones dropped in post filtering: 0 (0%) Reads dropped in post filtering: 0.0 (0%) Alignments filtered by tag prefix: 4978 (0.23%) IGH chains: 47 (54.02%) IGH non-functional: 23 (48.94%) IGK chains: 35 (40.23%) IGK non-functional: 15 (42.86%) IGL chains: 5 (5.75%) IGL non-functional: 0 (0%) Pre-clone assembler report: Number of input groups: 93301 Number of input groups with no assembling feature: 12 Number of input alignments: 1805673 Number of alignments with assembling feature: 1770142 (98.03%) Number of output pre-clones: 101943 Number of pre-clonotypes per group:
0: + 744 (0.8%) = 744 (0.8%) 1: + 83175 (89.16%) = 83919 (89.96%) 2: + 9342 (10.01%) = 93261 (99.97%) 3: + 28 (0.03%) = 93289 (100%) Number of assembling feature sequences in groups with zero pre-clonotypes: 2623 Number of dropped pre-clones by tag suffix conflict: 0 Number of dropped alignments by tag suffix conflict: 0 Number of core alignments: 1702472 (94.28%) Discarded core alignments: 67670 (3.97%) Empirically assigned alignments: 18092 (1%) Empirical assignment conflicts: 0 (0%) Tag+VJ-gene empirically assigned alignments: 18092 (1%) VJ-gene empirically assigned alignments: 0 (0%) Tag empirically assigned alignments: 0 (0%) Number of ambiguous groups: 9370 Number of ambiguous V-genes: 38 Number of ambiguous J-genes: 19 Number of ambiguous tag+V/J-gene combinations: 57 Ignored non-productive alignments: 0 (0%) Unassigned alignments: 85043 (4.71%)

milaboratory / mixcr

UMI with Nanopore Data #1614