Closed kir1to455 closed 2 months ago
Hello @kir1to455,
However, I found that the -t parameter of modkit does not use multiple CPUs to improve the speed of tasks.
When you have a reference FASTA with many short sequences (like transcripts) the parallelism in modkit
doesn't work ideally. Improving this performance is a top priority for the next release. I'll let you know as soon as this work is completed.
To your second question:
The output default of modkit pileup
as you have it will report all genomic positions with at least 1 modification call. To me it looks like position 231 is not a genomic DRACH motif, however 1 read has a DRACH motif aligned there, the majority of the reads align to position 232.
Hello @kir1to455,
The latest modkit release candidate v0.2.5-rc1 should have much better performance when using a transcriptome reference. Let me know if you have a chance to try it and or if you encounter any problems.
Hi, @ArtRand The latest modkit release is currently performing well in transcriptome reference.
Best wishes, kirito
Hi, I have successfully run Dorado with RNA004 m6A model and used modkit to convert my modbam file to bedmethyl output.
${DoradoDir}/dorado basecaller --min-qscore 7 --verbose --emit-moves -k 14 --secondary=no --reference ${indexDir}/gencode.v43.transcripts.fa -x cuda:0 ${ModelDir}/rna004_130bps_sup@v3.0.1 ${pod5path}/HEK293T.pod5 --modified-bases m6A_DRACH > ${ModDir}/hac.pass.m6A.mod.pass.bam
samtools sort -o hac.pass.m6A.mod.sorted.bam -@ 40 hac.pass.m6A.mod.pass.bam
samtools index -@ 40 hac.pass.m6A.mod.sorted.bam
${modkitDir}/modkit pileup ${bamfile}/hac.pass.m6A.mod.sorted.bam ${bamfile}/hac.pass.m6A.bed --log-filepath ${bamfile}/hac.log --num-reads 20084 --max-depth 20000 -t 35
However, I found that the -t parameter of modkit does not use multiple CPUs to improve the speed of tasks. And Question2 is that I found the Nvalid_cov of the adjacent position of the same transcripts is particularly large. If I don't get it wrong, because the training is DRACH motif. If there are two adjacent A at the same time, one of them is fake. And the the column Nnocall is equivalent to depth. Here, it is clear that 232 is a fake m6A site. But how to construe the Ncanonical = 1 in this fake position?Best wishes, Kirito