Closed hyunjokoo closed 11 months ago
Hello @hyunjokoo,
Is it possible that all of your reads are unmapped? Could you try
samtools view -F 4 call_sup_5mCG_5hmCG.bam | wc -l
The recommended way to map the reads is with dorado aligner
docs. If you don't have a reference sequence to map to, you can use modkit extract
docs to inspect the base modification information in your modBAM.
Hope this helps, happy to answer any additional questions.
The downloaded modkit executable file name has centos7. Is it ok to run modkit in Ubuntu, too? For now both Centos and Ubuntu server gave me the same error message.
Yes, the build is for Centos because it uses an older and more portable libc.
Thank you for your quick reply.
1. As you suggested, I checked it, and there was 0 mapped read.
I used the whole data (about 285 G of pod5 file size) for dorado basecaller
to find methylated site, but I could not find any mapped read from the modBAM file.
Is it normal that I cannot find any methylated site? Two tested samples were from insect and plant.
2.
I have another question. You mentioned dorado aligner
, but I feel that dorado aligner
does not help to find methylated site. Is it correct? I think dorado aligner
is just like what we can do minimap2
using the fastq sequence converted from pod5 files.
3. Could you please tell me if there is any downloadable example pod5 file (hac or sup) that I can test to detect methylated sites using dorado?
Thank you.
HJK
Hello @hyunjokoo,
If you're interested in finding methylated bases in a reference genome, you need to map your reads onto the reference. I recommend using dorado aligner
for this task because it will not drop or corrupt the MM
and ML
BAM tags that contain the modified base information. Once you have this aligned BAM, you can use it with modkit pileup
and get a table of how many reads report methylation of each type at each reference position.
If, on the other hand, you don't have a reference genome, you can use modkit extract
and modkit summary
to see how much methylation is detected in your sample.
You cannot find methylated positions without mapping to a reference genome, however, you can only get read-level information (with extract
) or summary statistics.
If this isn't clear, maybe you could you help me understand what you're trying to do.
Thank you for your explanation. Now I understood that I need to align to a reference, first to do modkit pileup
. If you see my first command in my first thread, no reference was used at that time. That's why modkit pileup
failed.
Is there any good site that explains MM and ML tags? Or could you please explain them?
Thank you.
Hello @hyunjokoo,
There is a specification document here that explains the MM and ML tags.
Thank you!!
Hi I am having the exact same error come up. I have basecalled my pod5s using
dorado basecaller hac,5mC_5hmC,6mA ~/chem10/data --kit-name SQK-NBD114-24 >~/chem10/results/test_calls.bam
demultiplexed using
dorado demux --output-dir ~/chem10/results/ --no-classify ~/chem10/results/test_calls.bam
and aligned using:
dorado aligner -Y ~/chem10/references/pDEL152.fa ~/chem10/results/SQK-NBD114-24_barcode01.bam > ~/chem10/results/BC1_aligned.bam
But upon trying to pileup using:
modkit pileup ~/chem10/results/BC1_aligned.bam ~/chem10/results/BC1pileup.bed --log-filepath ~/chem10/results/pileupBC1.log
I get the error
Error! unable to open SAM/BAM/CRAM index for /home/strathsci/chem10/results/BC1_aligned.bam; please create an index
During alignent I can see 557 reads basecalled and only 5 of these are listed as unmapped so I believe there should be an index. The sequencing is of a short 5.7kb plasmid if that makes a difference.
Hello,
Question 1.
I prepared bam file from pod5 file.
This is how I prepared bam file. [hjk@PC24B090:/data_ngs3/hjk/data/nanopore_dorado_test]$
dorado basecaller /data_ngs3/hjk/utils/dorado/dorado-0.4.2-linux-x64/dna_r10.4.1_e8.2_400bps_sup@v4.2.0 nanopore_raw_2023-11-08_sup/ --modified-bases 5mCG_5hmCG > call_sup_5mCG_5hmCG.bam
[2023-11-10 11:28:44.891] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.2.0_5mCG_5hmCG@v3.1 with httplib [2023-11-10 11:28:51.850] [info] > Creating basecall pipeline [2023-11-10 11:29:11.691] [warning] Maximum safe estimated batch size is only 64. [2023-11-10 11:29:11.693] [warning] Maximum safe estimated batch size is only 64. [2023-11-10 11:29:11.715] [info] - set batch size for cuda:0 to 64 [2023-11-10 11:29:11.725] [info] - set batch size for cuda:1 to 64 [2023-11-10 11:33:59.227] [info] > Simplex reads basecalled: 4003 [2023-11-10 11:33:59.227] [info] > Basecalled @ Samples/s: 1.007219e+06 [2023-11-10 11:38:21.760] [info] > FinishedThen, I tried to use modkit. [hjk@ngs7:/data_ngs3/hjk/data/nanopore_dorado_test]$
/data_ngs2/hjk/utils/modkit/modkit_v0.2.2_centos7_x86_64/modkit pileup call_sup_5mCG_5hmCG.bam call_sup_5mCG_5hmCG.bam_modkit-pileup.bed --log-filepath call_sup_5mCG_5hmCG.bam_modkit-pileup.log
So I indexed bam file, and ran again. [hjk@ngs7:/data_ngs3/hjk/data/nanopore_dorado_test]$
samtools index call_sup_5mCG_5hmCG.bam
[hjk@ngs7:/data_ngs3/hjk/data/nanopore_dorado_test]$
/data_ngs2/hjk/utils/modkit/modkit_v0.2.2_centos7_x86_64/modkit pileup call_sup_5mCG_5hmCG.bam call_sup_5mCG_5hmCG.bam_modkit-pileup.bed --log-filepath call_sup_5mCG_5hmCG.bam_modkit-pileup.log
It said that my bam index is wrong. Could you please explain the detail how I should prepare the required input files for modkit?
Question 2
The downloaded modkit executable file name has centos7. Is it ok to run modkit in Ubuntu, too? For now both Centos and Ubuntu server gave me the same error message.
Thank you.
HJK