swarris / Pacasus

Correction of palindromes in long reads from PacBio and Nanopore
MIT License
14 stars 3 forks source link

Pacasus seems to get slower during running #11

Closed ilnamkang closed 4 years ago

ilnamkang commented 5 years ago

Hi,

I'm trying to process my PacBio sequencing data obtained from MDA products using Pacasus.

This is the first time I use Pacasus, and the program has been running for >40 hours for a fasta file from a single PacBio RS II cell. The number of reads in the fasta files is ~200,000.

But, it seems that Pacasus is getting slower. I'm checking GPU utilization by 'nvidia-smi'. During the first ~10 hours, GPU utilization ranged ~10-20%. However, after ~20 hours, GPU utilization dropped to ~0-5% (nearly always 0% or 1%).

Are there any methods that can increase (or maintain) the speed (or GPU utilization) of Pacasus?

I'm running Pacasus on my Ubuntu (18.04) machine equipped with NVIDIA GeForce GTX 1660 (6 GB memory).

My command was as below. ./pacasus.py --device_type=GPU --platform_name=NVIDIA --framework=OpenCL --limit_length=20000 -o pacasus-out.fasta M26084.fasta

(I've failed to install pyCuda, but succeeded in installing pyOpenCL. When I ran without --limit_length option, the program stopped due to memory overallocation.)

Thanks.

Ilnam

swarris commented 4 years ago

Hi Ilman,

Sorry for the late response!

During running, Pacasus collects all results in memory in stead of streaming it directly to file. I can see that after running it for such a long time the python objects with the results start to become quite large. Python can then become slow. So I assume the problem is not so much with the GPU code. I never run a single instance of pacasus for such a long time. My approach:

The last step is to avoid redoing a lot of reads when something goes wrong (some GPU configuration tend to lock for no apparent reason...). With the first step you can run multiple pacasus instances at once. As memory consumption of short reads is very low, you can run for example 5 of those combined with one instance processing the very long reads. In many cases the memory limit for OpenCL on GPU is about 2.5GB, which might example running out of memory with reads > 20kb. You can also run it on the CPU by installing the OpenCL driver for your platform. The memory limit it that case is 64GB.

Let me know how this work out for you.

jcerca commented 4 years ago

Dear Sven,

I'll jump on this thread to ask you about speeding up the process; asking here and not via e-mail as I hope it benefits everyone.

So (step 1), from what I understood from your comment, you sort the fasta into 3 files containing : all reads <500 bp. all reads between 501-2000 bp all reads 2001-5000 bp all reads >5001 bp

OR do you do mean (this one doesn't make much sense to me - otherwise you're artifically increasing your coverage by copying the reads): all reads <500 bp all reads < 2000 bp all reads < 5000 bp

Then on step 2 you cap files on 10.000 reads and run these independently through pacasus. Is it possible to lower this number to, say 5.000 or 2.000 reads or even 1.000 reads, or should there be a minimum number of reads?

Best, José

swarris commented 4 years ago

Dear José,

You are correct with the first description of step 1, otherwise you would indeed process a (short) read several times.

For the second step you are free to choose, taken the following into consideration:

However, the largest impact on performance will be that more reads are processed in parallel (https://github.com/swarris/pyPaSWAS/issues/7#issuecomment-551150090), but I have to find the time to implement it.

ilnamkang commented 4 years ago

Hi,

Your suggestion worked nicely for me.

I just split my input fasta file by 10,000 reads without considering read length, and ran Pacasus for the resulting files one by one. It took ~15 hrs to process all the files. This is a dramatic speed-up because it took ~5 days when the input file was not split and processed as a whole.

If I follow your tips more rigorously, i.e. if I split my input files considering read length and run multiple instances of Pacasus for short reads, then the running time would decrease further.

Thank you very much for your help.