Closed MargauxAlison closed 5 years ago
Dear Margaux-Alison,
Thank you for your interest in Pacasus!
The maximum read length is limited by the amount memory available on the computer (or GPU). Up to 20kb usually fits in 4GB RAM. Lengths above that you need more memory. Currently, OpenCL on a CPU allows up 64GB to be allocated, which should be enough for most PacBio datasets. ONT sets will contain reads that will not fit in memory.
This is all due to the quadratic size of the scoring matrix in the Smith-Waterman algorithm. To determine the start of the palindrome, we need to be able to do a full traceback and for this we need the entire matrix in memory.
What I also usually do is split-up the read set in bins of different lengths. Short reads (up to 5kb) can then also be processed in parallel, speeding up the process.
Let me know if I can be of assistance in helping you process your reads. You can also send me an email (its in de pre-print: https://www.biorxiv.org/content/early/2017/08/09/173872). The paper has just be accepted in BMC Genomics :-)
Dear Swarris,
Congrats for the paper!
And thank you for your answer. I was a bit confused by the error message because nothing was indicating memory problems but now I understand. I will increase the memory and do as you suggested to speed up the analyses.
Thanks again
Hey @swarris, do you have any tips for setting up OpenCL to run with 64GB of allocable memory?
I installed these drivers https://software.intel.com/en-us/articles/opencl-drivers#cpu-section on an ubuntu VM that should have sufficient memory but clinfo
returns
Global memory size 63322882048 (58.97GiB)
Error Correction support No
Max memory allocation 15830720512 (14.74GiB)
I've been unable to find a way to increase the max memory allocation.
Good question! I've managed to use more than 14GB on a particular system (running centos), but with OpenCL you are very much dependent on OS, driver (version) and device. I'm very surprised that an 'open standard' has so many different ways of working on different devices and with different drivers.... I'm still looking for help to lower the memory requirements for SW so all read lengths can be handled on GPUs or basic desktop PCs. With ONT reads becoming longer and longer this is becoming a real issue. With PacBio HiFi the reads are <25kb so that should work for now for all reads.
Hello,
First thank you for developping Pacasus, I was very happy to discover such a tool. However I encounter some problems when running pacasus on my data set. It failed after few second, and at the end of the output I receive :
When I run it again after removing reads with length superior at 23kb it seems to work (It is running since 17h so I assume it is functionning). Is there a limitation in read length in Pacasus? If not do you have an idea of what the problem coule be?
Thanks,
Best,
Margaux-Alison