Hi I have a few questions related to best practices here.
1) I have 5 replicates that I merged into 1 bam. The merged bam has ~300M alignments, including secondary alignments. Should I train on this entire bam or what percentage/#of reads should I train on? Should I discard secondary alignments first?
2) Based on the ribotish quality the 28nt reads have good periodicity, ~85% frame1, while the 27nt reads have bad periodicity, ~50% frame1 ~50% frame2. Should I add a uniform offset to the 28nt reads and learned offsets for the 27nt reads, or should I add learned offsets for all? I guess I would worry that since the 28nt reads are already good, adding variable offsets might dampen that signal.
3) If I want to use this for the Ribotish should I train on transcriptome alignment or genome alignment, or does it not matter?
I suggest using all data for training. down-sample only when the training step takes too long to finish.
The quality of your data seems exceptional. If most reads in you library are of 28 nt, it is fine to discard other read lengths and use a uniform offset for the 28 nt reads. Regarding your concerns, if the phase of that read length is already very high, proportion of in-frame reads will be similar after training based on our experience.
Training is always performed with the transcriptome alignment, while prediction can be performed with either.
Hi I have a few questions related to best practices here.
1) I have 5 replicates that I merged into 1 bam. The merged bam has ~300M alignments, including secondary alignments. Should I train on this entire bam or what percentage/#of reads should I train on? Should I discard secondary alignments first?
2) Based on the
ribotish quality
the 28nt reads have good periodicity, ~85% frame1, while the 27nt reads have bad periodicity, ~50% frame1 ~50% frame2. Should I add a uniform offset to the 28nt reads and learned offsets for the 27nt reads, or should I add learned offsets for all? I guess I would worry that since the 28nt reads are already good, adding variable offsets might dampen that signal.3) If I want to use this for the Ribotish should I train on transcriptome alignment or genome alignment, or does it not matter?
thank you!