nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Summary file for custom barcodes #682

Closed CGD-Helix closed 5 months ago

CGD-Helix commented 5 months ago

Hello,

I am having difficulty generating a summary report following the use of dorado basecaller for basecalling and demultiplexing of my .POD5 files with custom barcodes. I would like to make use of the following command so I can easily perform both steps in one command and then generate a summary file:

dorado basecaller hac <path_to_pod5>  --barcode-arrangement <path_to_toml_file> --barcode-sequences <path_to_fasta_file> > output.bam

My Arrangement files:

For testing purposes, I created a custom arrangement .toml file for NB01 -> NB05 of the SQK-NBD114-96 kit.

[arrangement]
name = "Native"
kit = "NB"

mask1_front = "AGGTTAA"
mask1_rear = "CAGCACCT"
mask2_front = "ATTGCTAAGGTTAA"
mask2_rear = "CAGCACC"

# Barcode sequences
barcode1_pattern = "NB%02i"
barcode2_pattern = "NB%02i"
first_index = 1
last_index = 5

The corresponding sequence file:

>NB01
CACAAAGACACCGACAACTTTCTT
>NB02
ACAGACGACTACAAACGGAATCGA
>NB03
CCTGGTAACTGGGACACAAGACTC
>NB04
TAGGGAAACACGATAGAATCCGAA
>NB05
AAGGTTACACAAACCCTGGACAAG

The basecalling and demultiplexing works fine and I can see when viewing the bam file that the classified reads are tagged by their barcode, e.g.

RG:Z:921d112a33266fd29089d1e6c9dfa86a706837fe_dna_r10.4.1_e8.2_400bps_hac@v4.3.0_Native_barcode02

However, when trying to generate a summary of the output.bam I get the following error:

$ dorado summary output.bam 
filename        read_id run_id  channel mux     start_time      duration        template_start  template_duration        sequence_length_template        mean_qscore_template    barcode
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
Aborted (core dumped)

This does not occur when using an established ONT barcoding kit under --kit-name and the summary file generation works fine. Are there extra steps I must perform in order to generate the summary file when using a custom kit?

Thank you, Conor

Run environment:

tijyojwad commented 5 months ago

Hi @CGD-Helix - this is already being worked on internally, and will be fixed in the next release. Apologies for hassle!

CGD-Helix commented 5 months ago

Hi @tijyojwad - No worries, that's great, thanks!