Closed adbeggs closed 1 year ago
PS At the current rate it is going on the V100 it won't finish for 90 days! Guppy would usually take 4-5 days depending on the volume of data
Hi Andrew, that seems odd, a few questions:
There is an edge case where Stereo will run slowly if follow on rates are low, especially if you run out of RAM. I suspect this is what you are encountering. It's something we will fix early in the new year.
HI MIke
The nodes have 500GB of system RAM but weren't being given the entire node, I have set it to giving it the entire node but still very very slow, in fact on our HPC dorado initiates but doesn't run - I might recompile from source to see if that makes any difference. Output is here:
CUDA/11.4.1
GCCcore/11.2.0
zlib/1.2.11-GCCcore-11.2.0
binutils/2.37-GCCcore-11.2.0
GCC/11.2.0
ncurses/6.2-GCCcore-11.2.0
zlib/1.2.11-GCCcore-11.2.0
bzip2/1.0.8-GCCcore-11.2.0
XZ/5.2.5-GCCcore-11.2.0
OpenSSL/1.1
cURL/7.78.0-GCCcore-11.2.0
SAMtools/1.15.1-GCC-11.2.0
[2022-12-28 14:14:22.917] [info] > Loading pairs file
[2022-12-28 14:14:22.939] [info] > Pairs file loaded
[2022-12-28 14:14:25.542] [warning] > warning: auto batchsize detection failed
[2022-12-28 14:14:27.389] [info] > Starting Stereo Duplex pipeline
It just sits there for hours and hours not doing anything. Duplex pairing rates on this library are 60%.
BW
Andrew
That doesn’t match my theory, could you confirm if simplex calling works on this node? Could you also check if you are running out of RAM and falling back to swap memory?
On Wed, 28 Dec 2022 at 14:16, Andrew Beggs @.***> wrote:
HI MIke
The nodes have 500GB of system RAM but weren't being given the entire node, I have set it to giving it the entire node but still very very slow, in fact on our HPC dorado initiates but doesn't run - I might recompile from source to see if that makes any difference. Output is here:
CUDA/11.4.1 GCCcore/11.2.0 zlib/1.2.11-GCCcore-11.2.0 binutils/2.37-GCCcore-11.2.0 GCC/11.2.0 ncurses/6.2-GCCcore-11.2.0 zlib/1.2.11-GCCcore-11.2.0 bzip2/1.0.8-GCCcore-11.2.0 XZ/5.2.5-GCCcore-11.2.0 OpenSSL/1.1 cURL/7.78.0-GCCcore-11.2.0 SAMtools/1.15.1-GCC-11.2.0 [2022-12-28 14:14:22.917] [info] > Loading pairs file [2022-12-28 14:14:22.939] [info] > Pairs file loaded [2022-12-28 14:14:25.542] [warning] > warning: auto batchsize detection failed [2022-12-28 14:14:27.389] [info] > Starting Stereo Duplex pipeline
It just sits there for hours and hours not doing anything. Duplex pairing rates on this library are 60%.
BW
Andrew
— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/68#issuecomment-1366685191, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYB7JCXJIQMGZZUMGSIALWPRDTRANCNFSM6AAAAAATLHC4SE . You are receiving this because you commented.Message ID: @.***>
Ps how many CPU cores are available to the job on this node?
On Wed, 28 Dec 2022 at 14:26, Mike Vella @.***> wrote:
That doesn’t match my theory, could you confirm if simplex calling works on this node? Could you also check if you are running out of RAM and falling back to swap memory?
On Wed, 28 Dec 2022 at 14:16, Andrew Beggs @.***> wrote:
HI MIke
The nodes have 500GB of system RAM but weren't being given the entire node, I have set it to giving it the entire node but still very very slow, in fact on our HPC dorado initiates but doesn't run - I might recompile from source to see if that makes any difference. Output is here:
CUDA/11.4.1 GCCcore/11.2.0 zlib/1.2.11-GCCcore-11.2.0 binutils/2.37-GCCcore-11.2.0 GCC/11.2.0 ncurses/6.2-GCCcore-11.2.0 zlib/1.2.11-GCCcore-11.2.0 bzip2/1.0.8-GCCcore-11.2.0 XZ/5.2.5-GCCcore-11.2.0 OpenSSL/1.1 cURL/7.78.0-GCCcore-11.2.0 SAMtools/1.15.1-GCC-11.2.0 [2022-12-28 14:14:22.917] [info] > Loading pairs file [2022-12-28 14:14:22.939] [info] > Pairs file loaded [2022-12-28 14:14:25.542] [warning] > warning: auto batchsize detection failed [2022-12-28 14:14:27.389] [info] > Starting Stereo Duplex pipeline
It just sits there for hours and hours not doing anything. Duplex pairing rates on this library are 60%.
BW
Andrew
— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/68#issuecomment-1366685191, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYB7JCXJIQMGZZUMGSIALWPRDTRANCNFSM6AAAAAATLHC4SE . You are receiving this because you commented.Message ID: @.***>
Hi Mike
Yes, simplex calling is working fine, calling very quickly as expected. There are 20 cores available on this node (it's an Icelake one). When I run it memory usage peaks at only 5G:
| Requested cpu=20,mem=400G,node=1,billing=20,gres/gpu=1 - 7-00:00:00 walltime
| Assigned to nodes bear-pg0103u14a
| Command /rds/projects/b/beggsa-clinicalnanopore/adb/NA12878/20221212_1633_3E_PAM86221_1ab2d60f/rundorado.slurm
| WorkDir /rds/projects/b/beggsa-clinicalnanopore/adb/NA12878/20221212_1633_3E_PAM86221_1ab2d60f
+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
| Finished at Wed Dec 28 14:35:15 2022 for beggsa(8152) on the BlueBEAR Cluster
| Required (00:13.314 cputime, 5017850K memory used) - 00:01:29 walltime
| JobState COMPLETING - Reason None
| Exitcode 0:15
+--------------------------------------------------------------------------+
I terminated the job as it isn't doing anything...
Even on the P24 it is painfully slow, it has been running for 2 hours and has only managed to process 7200 reads!
Only thing I can think of is I am running it on a single, very large pod5 file (1100GB) - would that make a difference - it doesn't seem to for simplex.
Ah, a very large pod5 is a relatively untested case and I can see several ways it would cause poor performance - luckily all quite fixable.
I will keep this issue open and get a fix to you in early Jan.
In the meantime could you demux the pod5 into smaller ones by channel ID and run stereo independently for each? This should be a best case scenario for performance with the present implementation.
On Wed, 28 Dec 2022 at 14:41, Andrew Beggs @.***> wrote:
Only thing I can think of is I am running it on a single, very large pod5 file (1100GB) - would that make a difference - it doesn't seem to for simplex.
— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/68#issuecomment-1366701066, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYB7OGWKNKOW43VYQKWSDWPRGRPANCNFSM6AAAAAATLHC4SE . You are receiving this because you commented.Message ID: @.***>
SBATCH --tasks 20
I can't explain the V100, but I think this SBATCH parameter is going to try loading 20x instances of Dorado, all of them trying to access the entire A30. What happens when you change this to the following?
SBATCH --tasks 1
SBATCH --cpus-per-task=20
What I was actually thinking was
SBATCH --ntasks=1
SBATCH --cpus-per-task=20
Update - it is a lot quicker with single POD5 files... interesting!
Same issue here with slow stereo calling but it persists even when using multiple small pod5s. I'm using dorado 0.1.1, ubuntu 22.04. Simplex calling with the dna_r10.4.1_e8.2_400bps_fast@v4.0.0
model calls at 40,000 reads/s, but when I try stereo duplex, even with the "fast" model, it calls at 300 reads per minute.
Duplex tools claimed I had 18% duplex rate. Any fix?
Hi
Would there be a speed benefit from using the sam file from the simplex super accuracy basecalling as input for dorado duplex calling (I think that was mentioned at NCM)? And if so how is that supplied?
I tried running dorado duplex dna_r10.4.1_e8.2_400bps_sup@v4.0.0 --pairs pairs_from_sam/pair_ids_filtered.txt sam_dir/ > duplex_orig.sam
However, it did not did find any reads and just completed with 0 reads basecalled.
dorado duplex -h Usage: dorado [-h] [--pairs VAR] [--emit-fastq] [--threads VAR] [--device VAR] [--batchsize VAR] [--chunksize VAR] [--overlap VAR] [--num_runners VAR] model reads
Positional arguments: model Model reads Reads in Pod5 format or BAM/SAM format for basespace.
Hi @Kirk3gaard in sam_dir
do you have pod5 files or a SAM file? Dorado Duplex calling requires the raw data in POD5 format, this is what reads
in the help is referring to.
Hi @vellamike so the help function suggesting "BAM/SAM format for basespace." is not an option for speeding things up? Or even a real option anymore? I was just wondering how I get to the "duplex for free" scenario mentioned in the NCM presentation (see below) when I have done simplex calling with super accuracy mode already. (The RTX 4090 card basecalled our best promethion run ~200 Gbp in 3 days with sup for simplex reads) Reference: https://youtu.be/8DVMG7FEBys
Ah, that is a hidden method for the eagle-eyed :)
This is a method which is very fast but works in sequence-space only so is less accurate, please run it like so:
duplex basespace /path/to/bam.bam --pairs /path/to/pairs.txt
This method is experimental - feedback welcome!
Sneaky. Thanks a lot! Okay so the recommended way of getting the most out of a sequencing run (and the GPUs) at the moment is to
Looking forward to see a simplification of this process to output simplex and duplex with one command. I will give the basespace and pod5 based versions a try and see how long it takes.
Hi @Kirk3gaard - yes, that is currently the best method. We are working on usability and performance improvements all the time and any feedback is very welcome.
P.S sorting pod5 by channel ID is a "Nice to have" but not crucial.
Hi @vellamike still seeing this issue. Have single POD5 files, fast calling on dorado on our A30 completes at 3e07 samples/s but when call duplex it justs sits there saying "Starting stereo duplex pipeline".
I've checked and it has the whole a30 node available to it so shouldn't be running slowly. I am running it on Redhat but can't see anything specific that might be causing the issue
THe whole run is teeny tiny - only 200k reads but meant to be 40% duplex
Can you show me the Duplex command you are running?
Also, is your pairs file tab or space delimited? It needs to be space delimited, could you check this?
"basespace" mode tried to load the entire BAM file into RAM before starting and died when it ran out of RAM. Maybe worth enabling a smarter way to avoid the need for massive memory.
I assume that only the two reads in the pair are needed to perform duplex calling so it should be possible to load subsets of pairs without crashing. Enabling the use of fastq files as input might make it even more flexible for people to prepare subsets using existing tools in combination with the par id file.
Hi @Kirk3gaard - that is indeed a problem with the current implementation of the Basespace method, especially for very large BAMs. Could split your BAM by channel ID into multiple BAMs and run duplex on each?
Tried running duplex with the pod5 files rather than basespace and it crashed after generating a sam file of the same size every time I tried. I looked through the syslog and it apparently runs well for some time and then suddenly runs out of memory.
"Out of memory: killed process 50831 (dorado)" "oom_reaper: reaped process 50831 (dorado)"
I would assume that it should be possible to run stereo duplex calling on a machine with 96 GB RAM and 24 GB GPU RAM as the software should not need to load all of the pod5 data into memory at once or whatever is causing this. Any hint as to what could be causing this?
Hi Rasmus, right now the host memory consumption is governed in a complicated way by a few parameters:
We have an upcoming release soon which significantly reduces the memory requirement on the host side for duplex. In the meantime, one thing you could do is demultiplex your pod5 by channel into multiple pod5s and run stereo on each independently.
Hi @Kirk3gaard @adbeggs @incoherentian @dithiii ,
Version 0.2.1 of Dorado introduces big speed and RAM utilisation improvements to Duplex calling - could you try this?
Should we test whether it runs without splitting reads by channel?
Yes please - memory consumption is down quite a bit and this should work fine now.
On Wed, 22 Feb 2023 at 19:58, Rasmus Kirkegaard @.***> wrote:
Should we test whether it runs without splitting reads by channel?
— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/68#issuecomment-1440715842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYB7JATR6R35D5AEKJPP3WYZVV5ANCNFSM6AAAAAATLHC4SE . You are receiving this because you were assigned.Message ID: @.***>
I will try!
From: Mike Vella @.> Sent: 22 February 2023 20:00 To: nanoporetech/dorado @.> Cc: Andrew Beggs (Cancer and Genomic Sciences) @.>; Mention @.> Subject: Re: [nanoporetech/dorado] Stereo Duplex sloooooow (Issue #68)
Yes please - memory consumption is down quite a bit and this should work fine now.
On Wed, 22 Feb 2023 at 19:58, Rasmus Kirkegaard @.***> wrote:
Should we test whether it runs without splitting reads by channel?
— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/68#issuecomment-1440715842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYB7JATR6R35D5AEKJPP3WYZVV5ANCNFSM6AAAAAATLHC4SE . You are receiving this because you were assigned.Message ID: @.***>
— Reply to this email directly, view it on GitHubhttps://github.com/nanoporetech/dorado/issues/68#issuecomment-1440717333, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC7KTDEA3KAJZXRRFINNAQDWYZV2VANCNFSM6AAAAAATLHC4SE. You are receiving this because you were mentioned.Message ID: @.***>
It started nicely. Then processed 310600 reads before it got "Killed"
Commands used to run dorado and output:
MODELPATH="/home/ubuntu/Desktop/software/dorado-0.2.1-linux-x64/models"
MODEL="dna_r10.4.1_e8.2_400bps_sup@v4.1.0"
POD5DIR=pod5/
dorado duplex $MODELPATH/$MODEL --device "cuda:all" --min-qscore 25 --pairs pairs_from_sam/pair_ids_filtered.txt $POD5DIR/ > duplex_$MODEL.sam
[2023-02-23 15:52:55.097] [info] > Loading pairs file
[2023-02-23 15:52:55.400] [info] > Pairs file loaded
[2023-02-23 15:52:59.938] [info] > Starting Stereo Duplex pipeline
> Reads processed: 310600Killed
Stereo performance improvements in https://github.com/nanoporetech/dorado/releases/tag/v0.2.2
Hi all
Running the Stereo pipeline on both a V100 (our P24, fully updated) and a HPC A30 node.. both are considerably slower than the Guppy Duplex pipeline... any suggestions? Ironically the A30 at full tilt seems slower than the V100
From the P24:
/data/software/dorado/bin/dorado duplex "/data/software/dorado/models/dna_r10.4.1_e8.2_400bps_sup@v4.0.0" pod5/ --pairs pairs_from_bam/pair_ids_filtered.txt | samtools view -b > duplex_dorado.bam
From our HPC:
Many thanks in advance!
Andrew