nanoporetech / dorado

Oxford Nanopore's Basecaller

https://nanoporetech.com/

Other

446 stars 54 forks source link

Question about basecalling and alignment #753

Closed nikj26 closed 2 months ago

nikj26 commented 2 months ago

Issue Report

Hi, I was wondering what default filtering is in place for the basecaller and aligner. Would there be the need to do extra filtering such as a samtools -F3844 filtering scheme to filter out non primary alignments? Or does the basecaller and aligner already have filtering in place? This would be used for basecalling modified bases then aligning to a reference sequence.

Run environment:

Dorado version: 0.6.0+7a6ab9ad
Dorado command: basecaller and aligner
Operating system:
Hardware (CPUs, Memory, GPUs):
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Source data location (on device or networked drive - NFS, etc.):
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

tijyojwad commented 2 months ago

Hi @nikj26 - dorado doesn't do any filtering of alignments. It'll output primary/secondary/supplementary alignments. However, if there are modified bases tags in a record, those are only retained for primary alignments (unless the -Y argument is passed to dorado aligner which forces soft clipping for all alignments).

nikj26 commented 2 months ago

Hi @tijyojwad thank you for the feedback. So would you say that after the dorado aligner filtering of the modified bases tags there is no need for additional filtering? I am new to using dorado and want to make sure I am analyzing my data correctly. The end goal is after basecalling and aligning I will be using modkit to get bedmethyl tables.

tijyojwad commented 2 months ago

Hi @nikj26 - dorado doesn't do any filtering on alignments per se, it just makes sure the reads and mod tags are in agreement with each other. if you need mod tags for supplementary/secondary alignments, make sure to add the -Y option.

The output aligned BAM should work with modkit.

selmapichot commented 2 months ago

@tijyojwad following the filtering topic, do you have any advice on the best practice to perform a filtering for dorado basecalling + alignement ? for unmodified bases ? Many thanks.

tijyojwad commented 2 months ago

MinKNOW uses the following q-score based thresholds for filtering reads -

--min-qscore 9 for HAC model
--min-qscore 10 for SUP model

However, my suggestion would be to look at the q score distribution of your data and determine what threshold makes sense for your use case.