nanoporetech / remora

Methylation/modified base calling separated from basecalling.
https://nanoporetech.com
Other
147 stars 18 forks source link

io_read.seq, ref_seq and signal orientation for revserse mapping #178

Open yuxinPenny opened 2 weeks ago

yuxinPenny commented 2 weeks ago

Dear authors of Remora,

I am trying to extract features from basecalls (io_read.seq), reference sequence (io_read.ref_seq) and signals from the io_read object generated from pod5 and bam.

I am confused about the orientation of those features for reads mapped to minus strand. Shall I reverse some of them to make them in the same orientation.

I would appreciate if you can help!

marcus1487 commented 2 weeks ago

All information in an io.Read object is "read-oriented". This is anchored to the basecalled sequence in 5' to 3' direction. So for a minus strand mapped read the ref_seq will be the reverse complement of the sequence found in the reference FASTA file. This corresponds to the 5' to 3' signal as well. So for RNA reads which are read in 3' to 5' direction the io.Read signal will be flipped from the POD5 storage format. This format allows the easiest extraction of training examples which is the primary target of the Remora repo. I hope this helps clarify a bit, and please respond if further clarification is required.

yuxinPenny commented 2 weeks ago

Thanks!

So, do we need to determine the direction of RNA reads (5' to 3' or 3' to 5') by ourselves and flip the signal manually?

marcus1487 commented 2 weeks ago

It depends on the goal of your analysis. There is an attribute, reverse_signal, which indicates if the read signal has been processed as 3' to 5' signal. If you can detail your analysis goals I can help figure out if you need to reverse certain arrays associated with an io.Read object.

wangziyuan66 commented 4 days ago

Hi Marcus,

I wonder if I need to modify the parameter sd_params (4,3,0.5) for RNA DTW after reverse_signal=True.

Best