Closed happier21 closed 4 months ago
Could you see the mapq distribution of the results from minimap2 for guppy and dorado results separately?
I am trying to see if the mapqs are really low in the minimap2 result for dorado (likely) or whether f5c is not counting the mapq right.
An example bash one liner Samtools view file.bam | cut -f5 | sort | uniq -c | sort -nr -k1,1
samtools view dorado.data.sorted.bam |cut -f 5|sort |uniq -c |sort -nr -k1,1
samtools view guppy.data.sorted.bam |cut -f 5|sort |uniq -c |sort -nr -k1,1
It seems that something went wrong when running dorado basecaller that caused mappqs to be too low
Yeh, it seems that f5c correctly reports that there are low mapq reads. If you used the same minimap2 command to map the reads, it is the basecaller that could be causing this.
Which model did you use for Guppy?
Is this 4KHz data or 5KHz data? Is the sequencing kit lsk114?
I know the problem. My own data is r10, but the sample data I downloaded when exploring the process is r9. However, I have been using r10 model to basecalling, and I have changed to r9 model to run. It turns out to be right. Thank you very much for your help; Thank you very much for your prompt response to my questions each time
This is the full log of the f5c call-methylation:
This is dorado basecaller's order: dorado basecaller /share/home/yzwl_hanxs/app/dorado-0.5.3-linux-x64/model/dna_r10.4.1_e8.2_400bps_sup@v4.1.0 ./pod5/ | amtools view -bhS -@ 10 > test.bam Convert bam to fastq: samtools fastq -0 test.fastq test.bam Use minimap2 to align: minimap2 -a -x map-ont /share/home/yzwl_hanxs/refdata-gex-GRCh38-2020-A/fasta/genome.fa test.fastq | samtools sort -o test.sorted.bam -T test.tmp This is the log for minimap2: Why does this happen Thank you, ShengquanWang