Open yuxinPenny opened 2 weeks ago
All information in an io.Read
object is "read-oriented". This is anchored to the basecalled sequence in 5' to 3' direction. So for a minus strand mapped read the ref_seq will be the reverse complement of the sequence found in the reference FASTA file. This corresponds to the 5' to 3' signal as well. So for RNA reads which are read in 3' to 5' direction the io.Read signal will be flipped from the POD5 storage format. This format allows the easiest extraction of training examples which is the primary target of the Remora repo. I hope this helps clarify a bit, and please respond if further clarification is required.
Thanks!
So, do we need to determine the direction of RNA reads (5' to 3' or 3' to 5') by ourselves and flip the signal manually?
It depends on the goal of your analysis. There is an attribute, reverse_signal
, which indicates if the read signal has been processed as 3' to 5' signal. If you can detail your analysis goals I can help figure out if you need to reverse certain arrays associated with an io.Read object.
Hi Marcus,
I wonder if I need to modify the parameter sd_params (4,3,0.5) for RNA DTW after reverse_signal=True.
Best
Dear authors of Remora,
I am trying to extract features from basecalls (io_read.seq), reference sequence (io_read.ref_seq) and signals from the io_read object generated from pod5 and bam.
I am confused about the orientation of those features for reads mapped to minus strand. Shall I reverse some of them to make them in the same orientation.
I would appreciate if you can help!