Closed maximilianmordig closed 3 months ago
Hello
Yes, your understanding is correct. resquiggle - align a raw signal to the basecalled read eventalign - align a raw signal to the reference genome/transcriptome
I did not completely understand your start_kmer
, end_kmer
question - do you refer to the two columns in the PAF format? start_kmer
and end_kmer
columns in PAF for eventalign are the kmer coordinates in the reference sequence for which the signal map (See the example figire https://hasindu2008.github.io/f5c/docs/output#positive-strand with the example PAF).
Yes, for resquiggle as we align the signal to the basecalled read, the direction is always +. Here it means that the signal maps to the basecalled read (not the reverse complement of the basecalled read). So in this context it is the signal mapping with respect to the basecalled read which is always + (not to be confused with the direction in BAM which is the read mapping with respect to the reference). This is mainly to conform with the PAF format (dropping would mess up the columns).
Both resquiggle and eventalign performs event segmentation as the first step. Those events are then aligned to the basecalled read/reference depending on resqiggle/eventalign.
Hope it is clear?
Yes, it is clear. So start_kmer, end_kmer
are not relative to the alignment start position, but absolute. I missed that from the example.
So I was wondering if resquiggle
can also output the event segmentation, similarly to eventalign
.
Do you mean the columns like the event mean and event standard deviation columns in the eventalign TSV output?
Hello, has this issue/question been resolved?
I am closing this issue as there isn't a response. Feel free to reopen if you need anything.
Hello, I am wondering about the difference between
eventalign
andresquiggle
.As I understand it from https://hasindu2008.github.io/f5c/docs/output,
resquiggle
aligns a nucleotide sequence to a raw signal (without needing alignment to a reference). whereaseventalign
uses a bam file to find the reference subsequence that the basecalled read aligns to, and then aligns the reference subsequence (rather than the basecalled read) to the raw read signal such thatref_seq[read_alignment_start+start_kmer:read_alignment_start+end_kmer)
aligns toraw_signal[start_raw:end_raw)
described by the ss tag, whereread_alignment_start
is the read alignment start reported in the bam file and [a, b) denotes the half-open interval. That is, the coordinatesstart_kmer, end_kmer
in event_align are reported relative to theread_alignment_start
.Along the way, I also noticed that the strand of the
f5c resquiggle
is always+
, but this cannot be inferred because no alignment bam file is given, so it would make sense to drop this column. Moreover, is it possible to perform event segmentation without providing a bam file, i.e. withresquiggle
?