swarris / Pacasus

Correction of palindromes in long reads from PacBio and Nanopore
MIT License
14 stars 3 forks source link

A question about the speed of Pacasus on CPU #13

Closed ilnamkang closed 4 years ago

ilnamkang commented 5 years ago

Hi,

Is Pacasus much slower when using CPU compared to when using GPU?

I'm running Pacasus on my Ubuntu 18.04 machine using OpenCL for both CPU and GPU.

Because I cannot process long (>20 kb) reads using my GPU (GeForce GTX 1660), I'm trying to use CPU (Intel Xeon Gold 6230) for long reads.

But, it seems that Pacasus runs much slower on CPU than on GPU because "build program" step after encountering "binary cache miss" occupies most of the running time.

How can I solve this problem?

Below is a typical stdout for the reads that invoked "binary cache miss".

(......) INFO - Reading query sequences 624 625... DEBUG - Initializing reader path = /home/kangin/Genome/IMCC26084/M26084.fasta limitlength = 100000... DEBUG - Initializing reader finished. DEBUG - Reading from fasta file... DEBUG - 1 sequences read. DEBUG - Sorting records on length... INFO - Query sequences OK. INFO - Reading target sequences 624, 625... DEBUG - Initializing reader path = /home/kangin/Genome/IMCC26084/M26084.fasta limitlength = 100000... DEBUG - Initializing reader finished. DEBUG - Reading from fasta file... DEBUG - 1 sequences read. DEBUG - Sorting records on length... INFO - Target sequences OK. INFO - Processing 1- vs 1-sequences DEBUG - Fixing palindrome sequences... DEBUG - Initializing hitlist... DEBUG - Initializing hitlist OK. DEBUG - Total memory on Device: 128910.945312 DEBUG - Compiling OpenCL code. DEBUG - Converting score to string... DEBUG - build program: binary cache miss (key: a8318e3c4a8e1ff37796513b2208aec8) DEBUG - build program: start building program from source on <pyopencl.Device 'Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz' on 'Intel(R) CPU Runtime for OpenCL(TM) Applications' at 0x5572822ce2c8> (......)

For some reads, stdout gets more detailed.

(......) DEBUG - build program: binary cache miss (key: 7b72efc9038d352f553fb2066988d2c9) DEBUG - build program: start building program from source on <pyopencl.Device 'Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz' on 'Intel(R) CPU Runtime for OpenCL(TM) Applications' at 0x5572822ce2c8> DEBUG - build program: from-source build complete DEBUG - pyopencl-invoker-cache-v6: in mem cache hit [key=244e333d3d979437f7bcd0bf55937b651d523fba9809f33202fe7f9b7c1aff5f] INFO - build program: kernel 'calculateScoreAffineGap' was part of a lengthy source build resulting from a binary cache miss (5.96 s) DEBUG - pyopencl-invoker-cache-v6: in mem cache hit [key=d33c929dce56ea9970bc4c7caa8922ef69ff38ce589bb361354c4e44d4d5a39f] INFO - build program: kernel 'calculateScore' was part of a lengthy source build resulting from a binary cache miss (5.96 s) DEBUG - pyopencl-invoker-cache-v6: in mem cache hit [key=3a13625b5e8e4a7c3a07ce1c9762440526335c08e2541724d2a4d57fa8083e41] INFO - build program: kernel 'tracebackAffineGap' was part of a lengthy source build resulting from a binary cache miss (5.96 s) DEBUG - pyopencl-invoker-cache-v6: in mem cache hit [key=2fed7d53e40d4d5ede75bffb1c65b92409c460755418c9e838b561cdfb9059e3] INFO - build program: kernel 'traceback' was part of a lengthy source build resulting from a binary cache miss (5.96 s) DEBUG - Initializing normal device memory. (......)

swarris commented 4 years ago

I'm not sure how to solve these cache hits. But in any case, the CPU will be slower than the GPU. The Smith-Waterman alignment is been processed in parallel and in our case the GPU is much more efficient in this than the CPU. Although the OpenCL-CPU code-base has been optimized fro CPUs to minimize the slow-down. Also, long reads will take (much) longer than short reads, which skews the performance speeds towards the GPU. If you'd really like to the know the speed difference between to two, run a data set with say 10kb reads on both platform.