nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
132 stars 7 forks source link

How to construe rna004 model rna004_130bps_sup@v3.0.1_DRACH bedMethyl output? #109

Closed kir1to455 closed 2 months ago

kir1to455 commented 8 months ago

Hi, I have successfully run Dorado with RNA004 m6A model and used modkit to convert my modbam file to bedmethyl output. ${DoradoDir}/dorado basecaller --min-qscore 7 --verbose --emit-moves -k 14 --secondary=no --reference ${indexDir}/gencode.v43.transcripts.fa -x cuda:0 ${ModelDir}/rna004_130bps_sup@v3.0.1 ${pod5path}/HEK293T.pod5 --modified-bases m6A_DRACH > ${ModDir}/hac.pass.m6A.mod.pass.bam samtools sort -o hac.pass.m6A.mod.sorted.bam -@ 40 hac.pass.m6A.mod.pass.bam samtools index -@ 40 hac.pass.m6A.mod.sorted.bam ${modkitDir}/modkit pileup ${bamfile}/hac.pass.m6A.mod.sorted.bam ${bamfile}/hac.pass.m6A.bed --log-filepath ${bamfile}/hac.log --num-reads 20084 --max-depth 20000 -t 35 However, I found that the -t parameter of modkit does not use multiple CPUs to improve the speed of tasks. And Question2 is that I found the Nvalid_cov of the adjacent position of the same transcripts is particularly large. image If I don't get it wrong, because the training is DRACH motif. If there are two adjacent A at the same time, one of them is fake. And the the column Nnocall is equivalent to depth. Here, it is clear that 232 is a fake m6A site. But how to construe the Ncanonical = 1 in this fake position?

Best wishes, Kirito

ArtRand commented 8 months ago

Hello @kir1to455,

However, I found that the -t parameter of modkit does not use multiple CPUs to improve the speed of tasks.

When you have a reference FASTA with many short sequences (like transcripts) the parallelism in modkit doesn't work ideally. Improving this performance is a top priority for the next release. I'll let you know as soon as this work is completed.

To your second question:

The output default of modkit pileup as you have it will report all genomic positions with at least 1 modification call. To me it looks like position 231 is not a genomic DRACH motif, however 1 read has a DRACH motif aligned there, the majority of the reads align to position 232.

ArtRand commented 8 months ago

Hello @kir1to455,

The latest modkit release candidate v0.2.5-rc1 should have much better performance when using a transcriptome reference. Let me know if you have a chance to try it and or if you encounter any problems.

kir1to455 commented 2 months ago

Hi, @ArtRand The latest modkit release is currently performing well in transcriptome reference.

Best wishes, kirito