milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
335 stars 79 forks source link

Question: Does mixcr take sequence stagger into consideration? #1710

Closed omegahh closed 5 months ago

omegahh commented 5 months ago

For better sequencing quality, I add stagger in my PCR primers just like "N{2:4}{PRIMER}". And my protocol also integrates the UMI tag into the DNA library.

So after align and refineTagsAndSort steps. Those reads that have the same UMI sequence, although they have the same UMI, due to the existence of stagger in the library, each read can have a certain number of base pair shifts. Therefore, I am curious whether the algorithm will consider the possibility of read shifts for sequences with the same UMI during the assembly step?

mizraelson commented 5 months ago

Yes, MiXCR will resolve this structure, but it should be correctly passed using the tag pattern. For example:

^N{2:4}(UMI:N{12})attgccgatc(R1:*)

The pattern above tells MiXCR that the sequence might start with 2 to 4 random nucleotides, followed by the UMI group of 12 nucleotides. The attgccgatc sequence will be used as an anchor (e.g., SSP sequence), and the rest of the read will be treated as a payload.

Let me know if you need help with creating the correct pattern.