nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Could I concatenate sequencing data from multiple run and basecall using Dorado? #369

Closed weishwu closed 11 months ago

weishwu commented 12 months ago

I have some historical ONT runs that were basecalled with these models:

run sample base_call_model
4031P A-1 2021-05-05_dna_r9.4.1_promethion_384_dd219f32
4092P A-1 dna_r10.4.1_e8.2_400bps_hac@v3.5.2
4211P A-1 dna_r10.4.1_e8.2_400bps_hac@v4.2.0

I'd like to concatenate them to achieve high depth for variant calling and methylation calling. I have fast5 files for them. Could I concatenate the fast5 files, convert to POD5, and then use Dorado to do modified basecalling with the latest model? Will there be any incompatibility issue between Dorado/model and the old chemistry?

Thanks.

sklages commented 12 months ago

That will not work and IMHO you don't need to. If you'd like to do modified base calling, just do it separately for each of the run datasets. Afterwards you take the BAM files, align the complete dataset to your reference genome and then use modkit for methylation counts or any other tool for SNP calling or any other task ..

weishwu commented 12 months ago

Hi @sklages, thanks for your reply. I see your point for methylation counts. For SNP calling I use Clair3 which takes into account the model used for basecalling. I asked this on Clair3 github and the answer was simply that I could not combine the runs from different models.