nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
445 stars 54 forks source link

Empty values for columns "template_start" and "template_duration" in summary for barcoded reads #796

Closed luigilamparelli closed 4 weeks ago

luigilamparelli commented 1 month ago

Issue Report

Please describe the issue:

Hello, I've noticed that, after demultiplexing, some reads have missing values for the columns "template_start" and "template_duration" in the summary produced by dorado summary. The same read but before barcoding have the columns filled and I expect the same after barcoding and demultiplexing.

Steps to reproduce the issue:

  1. Basecalling without barcoding: dorado basecaller -vv dna_r10.4.1_e8.2_400bps_sup@v4.1.0 ${pod5} > ${bam}
  2. Demultiplexing: dorado demux -vv --output-dir ${out_dir}/ --kit-name SQK-NBD114-96 --emit-summary ${bam}
  3. Summary on barcoded reads: dorado summary ${out_dir}/${barcoded_bam} > summary_barcoded.txt
  4. Summary on the original not barcoded reads: dorado summary ${bam} > summary_not_barcoded.txt

Run environment:

Logs

You will find attached the log files for the 3 dorado commands. In addition to these, I would like to share with you the tags of one of the reads which fails (0a0eb075-0cd0-41f6-9244-f9b5c4d17e61), both before and after barcoding. There is a difference in two tags:

bam_tags_barcoded.txt bam_tags_not_barcoded.txt dorado_basecaller.log dorado_demux.log dorado_summary.log

Here are the summary files obtained: summary_barcoded.txt summary_not_barcoded.txt

Thanks for the help!

malton-ont commented 1 month ago

Hi @luigilamparelli,

Thanks for raising this. This looks like an issue in calculating the new number of samples in the read after barcode trimming in the demux subcommand - as we're missing the move table at this point, we end up calculating the length as zero.

You can avoid this for now by performing the demux in line with basecalling (by providing the --kit-name parameter during basecalling) or by including the move table in the basecalling output (using the --emit-moves flag). Or you can skip trimming entirely with the --no-trim option.

We'll investigate the best way to resolve this properly for a future release.

tijyojwad commented 4 weeks ago

Hi @luigilamparelli - a fix for this has been released with dorado v0.7.0 and newer.