Closed Yuan2yuan2yuan closed 6 months ago
I checked out more reads and found the length of paired sequence (event sequence and move sequence) is different, not just one is longer than the other. I used sigtk to segment events and it used the same code in scrappie, so they should have the same length. Do you know the reason and how did you fix it?
I've uploaded a template read's result in a public repository, you can take a look.
Scrappie/Uncalled4 event detection is a low-level raw signal preprocessing step which groups similar stretches of signal into events, attempting to only split events at k-mer boundaries. It hardly does any signal filtering, except for some occasional filtering of obvious current spikes during Uncalled4 event detection, and doesn't attempt to identify template DNA vs adapter/barcodes or other signal noise. The Guppy move table doesn't use this type of "events", rather it simply groups the signal into fixed-length blocks (e.g. five raw time points) and annotates them by which basecalled nucleotides they approximately correspond to. This fixed block length means many "move" will span multiple k-mers, and is very often off by one or more bases. Uncalled4 aligns Scrappie-style events guided by the move table, but only incorporates this information after event detection.
If you just want to identify the "template" DNA region, then you can use the "template start" ("ts") BAM tag to find the start, and the end can be computed based on the number of moves ("mv" tag length - 1) multiplied by the move block length (the first element in "mv" tag, usually 5). If you want more accurate base-by-base alignments, then that is what Uncalled4 is for! If you want to get started with Uncalled4 I recommend using uncalled4 align ... -tsv-out
, which will output a TSV file with reference-to-signal alignment coordinates.
Thanks for your reply, I understand.
Hi, I found that the event sequence given by scrappie is longer than the "move" sequence given by guppy. How did you filter out those events aren't mapped?