Closed omegahh closed 2 months ago
Hi,
mixcr analyze
command?--assemble-clonotypes-by [{FR1Begin:CDR1End},{CDR2Begin:FR4End}]
.--tag-parse-unstranded --tag-pattern "^(UMI:NNNNtNNNNtNNNN)tN{7:8}(R1:*)\^N{17}(R2:*)"
doesn’t quite make sense because there are no anchor points to determine whether 7 or 8 nucleotides should be skipped. The same applies to --tag-parse-unstranded
, as there is no sequence to determine in which read the UMI is located. It should be, for example:^(UMI:NNNNtNNNNtNNNN)tN{7:8}atgggct(R1:*)\^N{17}(R2:*)
or simply:
^(UMI:N{14))N{7}(R1:*)\^N{17}(R2:*)
without --tag-parse-unstranded
, but in that case, you have to be sure the UMI is always in R1.
-Massemble.clnaOutput=true
and then use mixcr assembleContigs
to extend the sequence as much as possible.
My command for alignment is:
My library is produced by a RACE with UMI protocol designed by myself, and sequenced by PE300 strategy. Thus, in theoretically, it has the full VTranscriptWithP sequence, including 5UTR, L1/L2, and VDJRegion. But there is some gap about in FR2 loci, as shown in the following:
If I use "VDJRegion," many clones would be discarded, even they still have valid sequences in CDR2_TO_FR4. However, if I use CDR2_TO_FR4, exportClones would lack sequence information for FR1/CDR1.
I tried using a mix-in option like "[FR1,CDR1,CDR2_TO_FR4]" during alignment, but the software indicated that the order was incorrect, and it seems that the software would scramble the order.
I also tried parameters like "{FR1Begin:CDR1End}+{CDR2Begin:FR4End}", but the error messages from the software were not understandable to me.
I would like to know how to handle this situation? I want to make the most of the gene regions that can be covered by the sequencing data, such as FR1+CDR1+CDR2_TO_FR4.
Additionally, I would like to ask if MiXCR supports a flexible strategy: if a clone covers a large area, the assembled region should be sufficiently wide to cover more gene features, but if the coverage area is short, then use fewer gene feature for assembly. For example, I could list possible gene features from complete to partial, like [VDJRegion, FR1+CDR2_TO_FR4, CDR2_TO_FR4], and then the software tries them one by one, which could help preserve as many clones and gene features as possible.
Looking forward to your reply :)