Closed bshim181 closed 12 months ago
Hi, The command seems fine, but I think the issue lies in the length of your UMI. 7 nt is quite short to create sufficient diversity, so what most likely happened is you have multiple CDR3 assigned to the same UMI hence it wasn't possible to create 1 consensus. Can you please share the assemble report to confirm?
Hello, This was the assemble report. Seems like there was high number of assembling feature sequences in groups with zero pre-clonotypes: 178768
Analysis time: 1.07m
Final clonotype count: 3104
Reads used in clonotypes, percent of total: 40694 (16.48%)
Average number of reads per clonotype: 13.11
Reads dropped due to the lack of a clone sequence, percent of total: 1239 (0.5%)
Reads dropped due to a too short clonal sequence, percent of total: 0 (0%)
Reads dropped due to low quality, percent of total: 0 (0%)
Reads dropped due to failed mapping, percent of total: 410 (0.17%)
Reads dropped with low quality clones, percent of total: 194 (0.08%)
Aligned reads processed: 41298
Reads used in clonotypes before clustering, percent of total: 40694 (16.48%)
Number of reads used as a core, percent of used: 40645 (99.88%)
Mapped low quality reads, percent of used: 49 (0.12%)
Reads clustered in PCR error correction, percent of used: 0 (0%)
Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%)
Clonotypes dropped as low quality: 28
Clonotypes eliminated by PCR error correction: 0
Clonotypes pre-clustered due to the similar VJC-lists: 0
Clones dropped in post filtering: 0 (0%)
Reads dropped in post filtering: 0.0 (0%)
Alignments filtered by tag prefix: 0 (0%)
TRA chains: 1870 (60.24%)
TRA non-functional: 188 (10.05%)
TRB chains: 1234 (39.76%)
TRB non-functional: 23 (1.86%)
Pre-clone assembler report:
Number of input groups: 5747
Number of input groups with no assembling feature: 1
Number of input alignments: 237344
Number of alignments with assembling feature: 236105 (99.48%)
Number of output pre-clones: 3920
Number of pre-clonotypes per group:
0: + 3111 (54.14%) = 3111 (54.14%)
1: + 1469 (25.57%) = 4580 (79.71%)
2: + 1047 (18.22%) = 5627 (97.93%)
3: + 119 (2.07%) = 5746 (100%)
Number of assembling feature sequences in groups with zero pre-clonotypes: 178768
Number of dropped pre-clones by tag suffix conflict: 0
Number of dropped alignments by tag suffix conflict: 0
Number of core alignments: 41259 (17.38%)
Discarded core alignments: 194846 (472.25%)
Empirically assigned alignments: 39 (0.02%)
Empirical assignment conflicts: 0 (0%)
Tag+VJ-gene empirically assigned alignments: 39 (0.02%)
VJ-gene empirically assigned alignments: 0 (0%)
Tag empirically assigned alignments: 0 (0%)
Number of ambiguous groups: 1166
Number of ambiguous V-genes: 87
Number of ambiguous J-genes: 47
Number of ambiguous tag+V/J-gene combinations: 134
Ignored non-productive alignments: 0 (0%)
Unassigned alignments: 196040 (82.6%)
Yes, that seems like an issue with UMIs. If you can share a fastq file it's pretty easy to export alignments with UMIs with lists of CDR3. But in general, 7 nt is a very low number. Usually 12 ish nucleotides is recommended. With this data I would recommend to analyze it without UMIs.
Is there a parameter to turn off pre-consensus with UMI? Would it be a centralized parameter with analyze or would i have to run each step? I am guessing it would be the generic-amplicon preset?
Yes, you can just use the preset without UMI and use tag pattern to trim first seven nucleotides to facilitate alignment.
--tag-pattern ^(R1:*)\^N{7}(R2:*)
if you wanna use part of the sequence from R2. Does it cover a part of V gene?
R2 overlaps very little of the V gene(18 bp) i believe but will specify the tag pattern and try to include them in the analysis. Thank you!
Of course! Let me know if there will me any other questions.
Another question I had was, if I still wanted to make use of UMI in the sequence, possibly loosen the threshold for finding UMI-based consensus(For example if you find 10 or more CDR3 sequences with same UMI, you discard them but less than that, you still include them in the clonotype assembly), is it possible to do so?
For example, I know like TRUST4, if there is multiple CDR3 for a single cell (UMI) it regards the most abundant CDR3 as the true CDR3 for a chain, and the less abundant CDR3s as secondary.
From my understanding, if i specify a preset for generic amplicon, I am assuming that clustering is only considered based on the gene feature similarity ( which i specified as CDR3) and there will be no UMI based consensus found. I still hope to make use of those UMI sequences present in the read to a certain degree while not having to sacrifice so many alignments in the process.
Answered in #1256
Hello,
I am trying to decide on whether the new version of MIXCR is compatible for our TCR seq pipeline. Our TCR Seq pipeline is based on the paper "RNase H-dependent PCR-enabled T Cell Receptor sequencing (rhTCRseq) for Highly Specific and Efficient Targeted Sequencing of T Cell Receptor mRNA for Single-Cell and Repertoire Analysis"
The read structure resembles the form,
This was the command I used to run mixcr with
mixcr analyze generic-amplicon-with-umi \ --species hsa \ --library imgt \ --rna \ --rigid-left-alignment-boundary \ --floating-right-alignment-boundary C \ --tag-pattern '^(R1:*)\^(UMI:N{7})' \ ${R1_file} \ ${R2_file} \ /mixcr_OUTPUT/${filename_woExt}/${filename_woExt}
The problem I have encountered is that in comparison with our existing pipeline where it uses the old MIXCR (V2), it differs significantly in terms of the unique clonotypes output
I am aware that there is level of UMI correction and finding pre-consensus. Also, there is level of CDR3 clustering based on nt mismatch thresholds (2nt or 1 indels).
But the difference I am seeing here is very noticeable and I am worried that my set up might be incorrect. I am also confused by the high rates of unassigned alignments in clonotype assembly which might have caused overall decrease in number of unique clonotypes identified. What might have caused this high rates of unassigned alignments during clonotype assembly? This was visible in all 4 samples which I have conducted test runs on.
Successfully aligned reads: 97.60% [OK] Off target (non TCR/IG) reads: 0.61% [OK] Reads with no V or J hits: 1.77% [OK] Reads with no barcode: 0.0% [OK] Alignments that do not cover CDR3: 0.50% [OK] Tag groups that do not cover CDR3: 0.017% [OK] Barcode collisions in clonotype assembly: 20.29% [ALERT] Unassigned alignments in clonotype assembly: 82.59% [ALERT] Reads used in clonotypes: 16.48% [ALERT] Alignments dropped due to low sequence quality: 0.0% [OK] Alignments clustered in PCR error correction: 0.0% [OK] Clonotypes clustered in PCR error correction: 0.0% [OK] Clones dropped in post-filtering: 0.0% [OK] Alignments dropped in clones post-filtering: 0.0% [OK] Reads dropped in tags error correction and filtering: 1.51% [OK] UMIs artificial diversity eliminated: 11.85% [OK] Reads dropped in UMI error correction and whitelist: 0.0% [OK] Reads dropped in tags filtering: 1.51% [OK]