lbcb-sci / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads
MIT License
201 stars 34 forks source link

Why are there two `aligning overlaps` (per split/chunk)? #30

Closed SHuang-Broad closed 3 years ago

SHuang-Broad commented 4 years ago

Hi Robert,

while I'm running racon on my draft asm using GPUs, I observed that there are two aligning overlaps steps, the first using GPU, relatively quick, and the second using CPU, taking relatively longer time.

Am I setting parameters in a wrong way, or is this expected?

My biggest contig is about 90M, NG50 ~ 11M, LG50 ~70, out of 3000 ~ 4000 (un-scaffolded) contigs on a primate genome.

# 10000000000 == 10,000,000,000, aka 10GB
./racon_wrapper -u -t 32 -c 4 --cudaaligner-batches 50 --split 10000000000 ...
[RaconWrapper::run] preparing data with rampler
[RaconWrapper::run] total number of splits: 2
[RaconWrapper::run] processing data with racon
Using 2 GPU(s) to perform polishing
Initialize device 0
Initialize device 1
[CUDAPolisher] Constructed.
[racon::Polisher::initialize] loaded target sequences 10.777237 s
[racon::Polisher::initialize] loaded sequences 2096.867626 s
[racon::Polisher::initialize] loaded overlaps 41.415534 s
[racon::CUDAPolisher::initialize] allocated memory on GPUs for alignment 0.603412 s
[racon::CUDAPolisher::initialize] aligning overlaps [====================] 850.375795 s
[racon::Polisher::initialize] aligning overlaps [====================] 3125.817517 s
[racon::Polisher::initialize] transformed data into windows 57.191808 s
[racon::CUDAPolisher::polish] allocated memory on GPUs for polishing 62.591201 s
[racon::CUDAPolisher::polish] generating consensus [====================] 2028.678508 s
[racon::CUDAPolisher::polish] polished windows on GPU 2246.814400 s
[racon::CUDAPolisher::polish] polished remaining windows on CPU 10.493641 s
[racon::CUDAPolisher::polish] generated consensus 7.312445 s
[racon::Polisher::] total = 8684.385253 s

Thanks, Steve

rvaser commented 4 years ago

Hello Steve, I suspect that some of the reads are too large to align on the graphics card so they are left for the CPU aligner? No idea really, maybe @tijyojwad can answer this inquiry.

Best regards, Robert

SHuang-Broad commented 4 years ago

Yeah, that's what I suspect too. If that indeed is the case, maybe it is limited by the memory on GPU?

tijyojwad commented 4 years ago

Hi @SHuang-Broad , that's indeed what's happening. Right now we have some upper limits on the size of sequences per alignment. I'm working on a chance to racon right now where instead of hard coding the upper limit we calculate that based on all the overlaps available. I think this should allow many more of the overlaps to be aligned on the GPU. I'll be submitting a PR for this in a few days, so hopefully that'll speed things up for you. Will ping this PR when it's done.

SHuang-Broad commented 4 years ago

That's awesome, Joyjit!

Thank you both!

I will leave this open and you can close this ticket in the PR, if appropriate.

tijyojwad commented 4 years ago

Hi @SHuang-Broad - can you try the new version of racon with some updates to CUDA alignment integration? the code is now handling the task distribution better, so more alignments should go to the GPU. empirically --cudaaligner-batches 8 gives good results