swarris / Pacasus

Correction of palindromes in long reads from PacBio and Nanopore
MIT License
15 stars 3 forks source link

unknown target CPU 'k6' #5

Closed orangeSi closed 6 years ago

orangeSi commented 6 years ago

Hi~ my command is: pacasus.py --device_type=CPU --platform_name=Intel --framework=opencl part.fasta -o cleaned.fasta --loglevel=DEBUG --maximum_memory_usage 0.2 and got this error:

DEBUG - Total memory on Device: 8192.0 DEBUG - Initializing hitlist... DEBUG - Initializing hitlist OK. DEBUG - Clearing device memory. DEBUG - Clearing normal device memory. DEBUG - Clearing zero-copy device memory. DEBUG - Total memory on Device: 8192.0 DEBUG - Compiling OpenCL code. DEBUG - Converting score to string... DEBUG - build program: binary cache miss (key: 9092090b9749e859422559dcdda24ffc) DEBUG - build program: start building program from source on <pyopencl.Device 'pthread-QEMU Virtual CPU version (cpu64-rhel6)' on 'Portable Computing Language' at 0x2b5df60> DEBUG - build program: start DEBUG - build program: completed, error ERROR - clBuildProgram failed: BUILD_PROGRAM_FAILURE -

Build on <pyopencl.Device 'pthread-QEMU Virtual CPU version (cpu64-rhel6)' on 'Portable Computing Language' at 0x2b5df60>:

error: : unknown target CPU 'k6'

(options: -I /ifshk7/BC_PS/sikaiwei/software/conda/hk_new/install/envs/Pacasus/lib/python2.7/site-packages/pyopencl/cl) (source saved as /tmp/tmpL_Dxuq.cl)

thanks~

swarris commented 6 years ago

Hi,

Thanks for downloading and using Pacasus!

It seems that you are using virtualization. That does not always work well with OpenCL: not all drivers or systems support this (yet). In your case it looks like the system reports the CPU is an AMD k6, which is an very old type and therefore most likely not correct in this case. Maybe check your settings? Or check /proc/cpuinfo?

You could give the small pyopencl example a go: https://documen.tician.de/pyopencl/ And see how this works

Is it possible to do without virtualization?

orangeSi commented 6 years ago

hi,

I use conda install the pyopencl,so I still use virualization, but I change to another compute node, Pacasus is working !

The memory is a little big, for example I use 50M subreads for test, but Pacasus use vmemory to 26G! when I use more data,the momery will increase as the same ?

the test command is:

pacasus.py --device_type=CPU --platform_name=Intel --framework=opencl $in -o cleaned.fasta --loglevel=DEBUG --maximum_memory_usage 0.05 --number_of_compute_units 2
swarris commented 6 years ago

Good to hear it is running.

The amount of memory used depends on the length of the read being processed, not on the size of the data set. Each read is processed independently of the rest. However, to detect a palindrome it is necessary to find both start en finish of the local alignment. This means that the entire smith-waterman scoring matrix needs to be kept in memory. For very long reads this can indeed lead to memory usage of many GB. This is unfortunate, but at this moment also unavoidable. We are looking into ways of reducing the amount of memory required, for example by doing banded alignments.

orangeSi commented 6 years ago

ok, I will test the max memory and run time, thank~

My data is from non-amplified DNA library, So Pacasus is also suitable for that?

swarris commented 6 years ago

Yes, that should work perfectly. We found that in some SMRTcells the smrtbell adapter was missing in many reads. In those cases the reads also contain palindromes and were detectable by pacasus. But other uses are of course also possible.

Keep me posted, I'm curious at how you use the tool and what the results were. And if you run into strange things again, feel free to create a new issue,

orangeSi commented 6 years ago

the script use 17days cpu time, max memory is 79G until now. The subreads is 1G of bacteria data. I just killed the script beacause of too much cpu time and memory~ Anyway, thanks for this tool~

swarris commented 6 years ago

Sorry to hear it takes so long. The smith-waterman algorithm is extremely time-consuming, but currently the only way of finding palindromes in low-quality reads.

You can speed it up by cloning the newest version of pyPaSWAS: https://github.com/swarris/pyPaSWAS I've not tested it yet, but it should work as the interface has not changed. Also, for reads < 25k I would recommend using GPUs. That will speed up the analyses even further. It will not help with memory usage though. Although I'm surprised it sed up to 79GB. I thought the memory limit for OpenCL was 64GB.

Could you tell me why you were using pacasus, as the data was not from amplified material?

orangeSi commented 6 years ago

I just guess some too long subreads maybe miss the adapter when split Polymerase to subreads。I first try proovreads,it output 0.1% subread which maybe siamaera .

swarris commented 6 years ago

I see. That can happen with some pacbio runs quite a lot indeed. Even is this is the case in around 10% of the reads it will mess up your downstream analyses. And I have examples of non-amplified read sets where running pacasus was an absolute must to get a proper genome assembly. I would recommend to take a subset of reads to test. For example 1000 reads about 15kb in length. That will give you an impression whether or not it is worth while to spend time on correcting the entire data set.

orangeSi commented 6 years ago

OK, thanks.