nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
446 stars 54 forks source link

Question about basecalling and alignment #753

Closed nikj26 closed 2 months ago

nikj26 commented 2 months ago

Issue Report

Hi, I was wondering what default filtering is in place for the basecaller and aligner. Would there be the need to do extra filtering such as a samtools -F3844 filtering scheme to filter out non primary alignments? Or does the basecaller and aligner already have filtering in place? This would be used for basecalling modified bases then aligning to a reference sequence.

Run environment:

tijyojwad commented 2 months ago

Hi @nikj26 - dorado doesn't do any filtering of alignments. It'll output primary/secondary/supplementary alignments. However, if there are modified bases tags in a record, those are only retained for primary alignments (unless the -Y argument is passed to dorado aligner which forces soft clipping for all alignments).

nikj26 commented 2 months ago

Hi @tijyojwad thank you for the feedback. So would you say that after the dorado aligner filtering of the modified bases tags there is no need for additional filtering? I am new to using dorado and want to make sure I am analyzing my data correctly. The end goal is after basecalling and aligning I will be using modkit to get bedmethyl tables.

tijyojwad commented 2 months ago

Hi @nikj26 - dorado doesn't do any filtering on alignments per se, it just makes sure the reads and mod tags are in agreement with each other. if you need mod tags for supplementary/secondary alignments, make sure to add the -Y option.

The output aligned BAM should work with modkit.

selmapichot commented 2 months ago

@tijyojwad following the filtering topic, do you have any advice on the best practice to perform a filtering for dorado basecalling + alignement ? for unmodified bases ? Many thanks.

tijyojwad commented 2 months ago

MinKNOW uses the following q-score based thresholds for filtering reads -

  1. --min-qscore 9 for HAC model
  2. --min-qscore 10 for SUP model

However, my suggestion would be to look at the q score distribution of your data and determine what threshold makes sense for your use case.