nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
446 stars 54 forks source link

Using Dorado for Nanopore Mycobacterial Data #760

Closed AzlanNI closed 2 months ago

AzlanNI commented 2 months ago

Issue Report

Please describe the issue: I was using Tombo for our bacterial Data bevor it was deprecated. I was suggested that i use dorado from now on. So i did. Now i got the issue that with Tombo i had way more C sites covered then with dorado. In my Reference i have 100.658 Cs and in my .bam file from which i made a .bed file with modbam2bed has only 16.447 Cs covered. Since i am analyzing the methylation of Cs this is quite an issue for me. So now i wanted to ask if someone could tell me if i am doing something wrong or if the sequencing just went really bad (i am not a lab scientist). So a lot of Cs which were covered with tombo are now not being covered with dorado. Also does my .Bed file have Positions for 5mC where no C is occuring in the reference.

Please provide a clear and concise description of the issue you are seeing and the result you expect.

Steps to reproduce the issue: I will just write my code i used to generate the files. I used this dorado call :

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:25 dorado basecaller \
    --modified-bases-models /software/dorado/0.5.1/models/dna_r10.4.1_e8.2_400bps_sup@v4.2.0_5mC@v2,/software/dorado/0.5.1/models/dna_r10.4.1_e8.2_400bps_sup@v4.2.0_6mA@v3 \
    /software/dorado/0.5.1/models/dna_r10.4.1_e8.2_400bps_sup@v4.2.0 \
    /gpfs/project/azlan/Myco_R10/"$input" \
    --reference "$reference_file" >> "${Name}_Aligned.bam"

done

I also will add the Modbam2bed script maybe it helps to understand my Problem: for mod_type in 5mC 6mA; do ./modbam2bed -e -m "$mod_type" -t 5 "$reference_file" "$dir" > "/gpfs/project/azlan/MycoR10/${Name}${mod_type}.bed" done I used the same reference file as for dorado. SP10291_R10IIPC_2n.fasta.gz

Please list any steps to reproduce the issue.

Run environment:

Logs