nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
445 stars 54 forks source link

Dorado duplex basecalling with modified bases doesn't work/respond #817

Closed desmodus1984 closed 1 month ago

desmodus1984 commented 1 month ago

Issue Report

Please describe the issue:

Hi, I was finally able to do sequence my samples and I want to basecalle my data and get, if possible, modified bases. I read that duplex basecalling can get modified bases and I used the following code with no log at all:

nohup /home/juaguila/appz/dorado-0.6.2-linux-x64/bin/dorado duplex --min-qscore 7 -c 50000 --emit-fastq sup,5mC_5hmC Ju760_1-split_by_channel/ > duplex.test.log

I am running on a server with 48 cores, I thought that dorado can automatically detect cores, and the log file was empty.

Steps to reproduce the issue:

I "installed" based on Please list any steps to reproduce the issue.

Then, I did the duplex data preparation with pod5 based on "Improving the Speed of Duplex Basecalling". Then I tried running duplex on the new duplex-pod5 directory, and got no response from Dorado.

Run environment:

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" Icon name: computer-server Chassis: server Machine ID: e86d5e6c9e304b10bfe9eb2c698cee62 Boot ID: d418135d14f44223b92be19e8f50a591 Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-1160.105.1.el7.x86_64 Architecture: x86-64

Logs

empty

HalfPhoton commented 1 month ago

Hi @desmodus1984,

From your command:

nohup dorado duplex --min-qscore 7 --chunksize 50000 --emit-fastq sup,5mC_5hmC Ju760_1-split_by_channel/ > duplex.test.log

There could be a couple of issues:

  1. Dorado writes htslib (fastq/bam) outputs to stdout and writes logging output to stderr. To capture stderr logs to a file you must redirect the stderr output using 2> as follows:
    • dorado sup,5mC_5hmC data/ ... > output.fastq 2> duplex.test.log
  2. The --chunksize 50_000 is unusually large - and could be causing instability - I would recommend not changing this value and leaving as the default 10_000. This could be why you're getting no fastq data written to duplex.test.log. Do you have a specific reason for setting chunsize to 50k?

Do you have a GPU in this system? Duplex basecalling, on a sup model with mods without a GPU is going to be very slow.

Kind regards, Rich