nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Information about basecalls and saving them to pod5 #337

Closed marketanykrynova closed 1 year ago

marketanykrynova commented 1 year ago

Is it possible to store base-called sequences in the FASTQ format into POD5 files using Dorado, similar to how it was achievable with FAST5 files using guppy_basecaller --fast5-out? Additionally, I would like to find out if there is a way to obtain further information about the basecalling of individual reads, such as parameters like start, step, duration, and trace.

vellamike commented 1 year ago

Is it possible to store base-called sequences in the FASTQ format into POD5 files using Dorado

This is not possible. Dorado only writes data in SAM/BAM/FASTQ format and POD5 is a raw-data format.

Additionally, I would like to find out if there is a way to obtain further information about the basecalling of individual reads, such as parameters like start, step, duration, and trace.

SAM tags and the POD5 itself contain this information (You can use read ids to relate called sequences to original POD5). Please let me know if you need more detailed information.

marketanykrynova commented 1 year ago

Thank you for your reply.

As you write:

SAM tags and the POD5 itself contain this information (You can use read ids to relate called sequences to original POD5). Please let me know if you need more detailed information.

I found an explanation about the read tags (https://github.com/nanoporetech/dorado/blob/master/documentation/SAM.md); however, I am still slightly confused. I have the moves, but I am not sure how to find out when the basecalling process starts. For example, if I have read like this: 63f7d8f5-cdc7-472b-bf07-36061f1d193b 4 0 0 0 0 CTGTAGTCTATTCATTCCCCTGCCTACCAACCACCCACCGTACCCGACACCGTATCCAAAAACTAGATATAACCTTGAGGGAACAACCAAGTACGTGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCAC #$%&&'''''&&$$$$$$%&&%%%%$'&%%&%$%%%$')&$$$$%$&'''(&$$$#$$%%$$##"###$%$#$%%%$$%&'('')(''&%%%'&&'**''&&'&&((&&%%%+,7;<=::98/55)))))/)%$ qs:i:8 du:f:0.761 ns:i:3805 ts:i:10 mx:i:1 ch:i:25 st:Z:2023-07-17T09:47:09.786+00:00 rn:i:20 fn:Z:FAW83864_cc485e85_55102b13_0.pod5 sm:f:96.5111 sd:f:19.318 sv:Z:quantile dx:i:0 RG:Z:55102b131d2a4a2c5a212d28ab06c7d63071f2fd_dna_r10.4.1_e8.2_400bps_sup@v4.2.0 mv:B:c,6,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,1,0,1,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,1,0,1,0,1,1,1,1,0,0,1,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,0,0,0,0,0,0,1,0,0,1,0,1,0,1,1,0,1,0,0,0,1,1,0,1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,0,1,1,0,0,0,1,1,1,0,0,0,0,0,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,0,0,1,1,0,1,0,1,1,0,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,1,1,0,1,1,0,1,1,1,1,0,1,1,0,1,1,0,1,0,1,0,0,0,0,1,1

How can I determine the start position (sample) when basecalling started and the step size between individual samples? (In the older chemistry, the parameter was 'step - 2.')

tijyojwad commented 1 year ago

the ts tag tells you how many samples were trimmed from the raw signal before basecalling started. the sample stride between each move table entry is represented by the first integer in the move table sequence, which is 6.

marketanykrynova commented 1 year ago

Thank for your reply, it will help!