mocherry commented 7 months ago

Issue Report

Please describe the issue:

I have fast5-files and wanted to basecalling with Dorado. These files had previously been analyzed, I thing with Guppy and we wanted to reproduce the results now with Dorado.
With the following command: dorado basecaller rna004_130bps_sup@v3.0.1 c:\111_mk\special_programs\dorado\bin\eab2 --modified-bases m6A_DRACH > sham.bam

we get this error: Sample rate for model (4000) and data (3000) are not compatible

Please provide a clear and concise description of the issue you are seeing and the result you expect.

Steps to reproduce the issue:

Please list any steps to reproduce the issue.

Run environment:

Dorado version: dorado-0.5.3-win64
Dorado command: dorado basecaller rna004_130bps_sup@v3.0.1 c:\111_mk\special_programs\dorado\bin\eab2 --modified-bases m6A_DRACH > sham.bam
Operating system: Windows10
Hardware (CPUs, Memory, GPUs): 12th Gen Intel Core i9-12900KF, 128 GB RAM, GPU: NVIDIA GeForce RTX 3090 ti, 24 GB
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): fast5
Source data location (on device or networked drive - NFS, etc.): external SSD
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

Thanks and best, Matthias

ethan-mcq commented 7 months ago

Your RNA kit used is different that what can be basecalled with the RNA004 model. Models that support 3k sample rates are rna002_70bps_hac@v3 and rna002_70bps_fast@v3. Your sample rate also indicates that you used the RNA002 kit.

Hope this helps. -Ethan

mocherry commented 7 months ago

Hi Ethan,Thanks for the clarification. I am just a bit confused about all the different kits and technologies. In the previous analysis, which we try to reproduce, this data set has been used (somehow?) to analyze m6A RNA-methylation. Now, as far as I understand, only the model I have used in the Dorado version I have used is able to extract this information from the fast5-files. May Guppy be able to workwith this data? So the question is: how do I have to do the basecalling with this kind of data to then have the modification in a bam-file. Somehow this has to be possible, as the people who have done this analysis previously obviously managed to do it. Unfortunately they are kind of difficult to get a hand on as they have moved to industry and we wanted to pave the way for our own upcoming analysis.Please excuse my somewhat naive questions, but everyone has to start somewhere and not everybody is a bioinformatic afictionado and has to count on the help of more proficient peers.Thanks in advance for any piece of help you can provide.Best,Matthias Von meinem/meiner Galaxy gesendet -------- Ursprüngliche Nachricht --------Von: Ethan McQuhae @.> Datum: 01.03.24 19:43 (GMT+01:00) An: nanoporetech/dorado @.> Cc: mocherry @.>, Author @.> Betreff: [EXTERN] Re: [nanoporetech/dorado] Sample rate for model (4000) and data (3000) are not compatible (Issue #662) Your RNA kit used is different that what can be basecalled with the RNA004 model. Models that support 3k sample rates are @. and @. Your sample rate also indicates that you used the RNA002 kit. Hope this helps. -Ethan

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

HalfPhoton commented 7 months ago

@mocherry, to the best of my knowledge we don't have a rna002 6mA model. I'll ask the mods team to double check this though. Kind regards, Rich

mocherry commented 7 months ago

so if I understand things correctly, with this kind of data it is not possible to use Doradao, right? The data, however, have been analyzed with respect to m6A-RNA modification. Maybe Guppy has been used. Would Guppy be able to work with this data?

Psy-Fer commented 7 months ago

Hello, I'm not from nanopore, but just chiming in to help.

For rna002 m6A data it was probably called with one of the tools from the community.

For an example of some of the most popular tools, see this benchmarking paper

Systematic comparison of tools used for m6A mapping from nanopore direct RNA sequencing

https://www.nature.com/articles/s41467-023-37596-5

I hope that helps you find what was used previously.

Cheers, James

HalfPhoton commented 6 months ago

@mocherry, has your question been answered satisfactorily?

nanoporetech / dorado

Sample rate for model (4000) and data (3000) are not compatible #662

Issue Report

Please describe the issue:

Steps to reproduce the issue:

Run environment:

Logs