Closed Papareddy closed 3 months ago
Dear Ranj,
(Not the developer)
You can see how I've implemented Modkit for this question in this notebook: https://github.com/Theo-Nelson/SMS-colab/blob/main/C8/HCV_AND_METTL3_KO_ANALYSIS.ipynb
Unfortunately, it seems like there are places which call both pseudo-Uridine and m6a simultaneously - this indicates to me at least that there is some kind of bug. I have not had a chance to track this further.
Sincerely, Theo
Hello @Papareddy,
It depends what you're trying to test with your experiment. If you want to find pseU positions, you could try using modkit pileup
and looking for genome positions with high modification percentages. If you tell me what you're doing, maybe I can give you a little more advice.
@Theo-Nelson,
Do you mean that there are genomic positions with a bedmethyl record for pseU and m6A like this?
ctg_01 39 40 a 1 + 39 40 255,0,0 1 0.00 0 1 0 5 119 211 0
ctg_01 39 40 17802 209 + 39 40 255,0,0 209 22.49 47 162 0 5 119 3 0
and
ctg_01 79 80 a 285 + 79 80 255,0,0 285 18.25 52 233 0 6 32 10 0
ctg_01 79 80 17802 6 + 79 80 255,0,0 6 16.67 1 5 0 6 32 289 0
The explination for this is as follows, take the first example. The reference has a U
(T
in the FASTA) at position 39, 209 reads have valid pseU calls and 1 has a m6A call, the one read with the m6A call has a T->A mismatch and there is a single, canonical call at that position. In the second example, the reverse is true, there is a reference A
and 285 reads reporting a potential m6A modification call, and 6 reads with a A->T mismatch. Usually this mismatch records have much lower valid coverage then the "correct" modification record. If you want to remove them, I'd recommend using awk
either as a pipe from the output or as a secondary step (in case you want to count the mismatches later).
e.g.
$ modkit pileup ${mod_bam} - | awk '$4=="17802"' > pileup_pseU.bed
# or
$ modkit pileup ${mod_bam} pileup_all_mods.bed
$ awk '$4==17802' pileup_all_mods.bed > pileup_pseU.bed
Hope this helps, happy to answer any additional questions.
Hello @Papareddy,
Feel free to re-open this issue if you have any additional questions.
Hi,
Thanks for this nice piece of software.
With Dorado 0.7, pseudouridination can be predicted. I was wondering if you have any tips and manual on how to identify PseU from the bam file generated after basecalling Modkit?
Cheers, Ranj