hasindu2008 / f5c

Ultra-fast methylation calling and event alignment tool for nanopore sequencing data (supports CUDA acceleration)
https://hasindu2008.github.io/f5c/docs/overview
MIT License
134 stars 27 forks source link

Difference between f5c eventalign and resquiggle, event segmentation with resquiggle #157

Open maximilianmordig opened 3 months ago

maximilianmordig commented 3 months ago

Hello, I am wondering about the difference between eventalign and resquiggle.

As I understand it from https://hasindu2008.github.io/f5c/docs/output, resquiggle aligns a nucleotide sequence to a raw signal (without needing alignment to a reference). whereas eventalign uses a bam file to find the reference subsequence that the basecalled read aligns to, and then aligns the reference subsequence (rather than the basecalled read) to the raw read signal such that ref_seq[read_alignment_start+start_kmer:read_alignment_start+end_kmer) aligns to raw_signal[start_raw:end_raw) described by the ss tag, where read_alignment_start is the read alignment start reported in the bam file and [a, b) denotes the half-open interval. That is, the coordinates start_kmer, end_kmer in event_align are reported relative to the read_alignment_start.

Along the way, I also noticed that the strand of the f5c resquiggle is always +, but this cannot be inferred because no alignment bam file is given, so it would make sense to drop this column. Moreover, is it possible to perform event segmentation without providing a bam file, i.e. with resquiggle?

hasindu2008 commented 3 months ago

Hello

Yes, your understanding is correct. resquiggle - align a raw signal to the basecalled read eventalign - align a raw signal to the reference genome/transcriptome

I did not completely understand your start_kmer, end_kmer question - do you refer to the two columns in the PAF format? start_kmer and end_kmer columns in PAF for eventalign are the kmer coordinates in the reference sequence for which the signal map (See the example figire https://hasindu2008.github.io/f5c/docs/output#positive-strand with the example PAF).

Yes, for resquiggle as we align the signal to the basecalled read, the direction is always +. Here it means that the signal maps to the basecalled read (not the reverse complement of the basecalled read). So in this context it is the signal mapping with respect to the basecalled read which is always + (not to be confused with the direction in BAM which is the read mapping with respect to the reference). This is mainly to conform with the PAF format (dropping would mess up the columns).

Both resquiggle and eventalign performs event segmentation as the first step. Those events are then aligned to the basecalled read/reference depending on resqiggle/eventalign.

Hope it is clear?

maximilianmordig commented 3 months ago

Yes, it is clear. So start_kmer, end_kmer are not relative to the alignment start position, but absolute. I missed that from the example. So I was wondering if resquiggle can also output the event segmentation, similarly to eventalign.

hasindu2008 commented 3 months ago

Do you mean the columns like the event mean and event standard deviation columns in the eventalign TSV output?

hasindu2008 commented 1 week ago

Hello, has this issue/question been resolved?