swarris / Pacasus

Correction of palindromes in long reads from PacBio and Nanopore
MIT License
14 stars 3 forks source link

Functions of Pacasus #1

Closed danshu closed 7 years ago

danshu commented 7 years ago

Hi, It's mentioned that Pacasus is "Tool from detecting and cleaning PacBio / Nanopore long reads after whole genome amplification". I'm looking for a tool that can aligns a read to itself and looks for palindromes because I found some of my pacbio subreads actually are polymerase reads that failed adapter detection and splitting. May I know if it is suitable for this purpose?

Best, Danshu

swarris commented 7 years ago

Hi Danshu,

It is suitable for this and indeed we have used Pacasus for exactly this. Although originally developed to solve palindrome sequences created by the whole genome amplification process, it can also be applied to high molecular weight DNA samples. In some cases we noticed that either the adapter was not present or was not detected, generating palindromic sequences. Pacasus will help you detect these and split the reads at the location of the adapter (or where it supposed to be). We are finalising the publication and I still need to update the wiki with a proper manual. So let me know when you run into difficulties.

Best, Sven

danshu commented 7 years ago

Hi Sevn,

It is exactly what I'm looking for! So excited! Since Pacasus is dependent on pyPaSWAS, I want to know how to install pyPaSWAS so that Pacasus can correctly import modules from pyPaSWAS.

Best, Danshu

swarris commented 7 years ago

pyPaSWAS is used as a submodule. I saw that the readme was not up to date. Run these commands in de Pacasus root folder: git submodule init git submodule update

Then you should be okay. If you're using an NVIDIA device, install the cuda sdk (see readme). For other GPUs or CPU usage, download the appropriate OpenCL driver and compiler.

danshu commented 7 years ago

Thanks! Then May I ask how to run Pacasus for pacbio raw reads? Do I need two input files?

"Usage: pacasus.py [options] FILE_1

This program performs a Smith-Waterman alignment of all sequences in FILE_1 against all sequences in FILE_2. Both files should be in the fasta format."

swarris commented 7 years ago

Ah, some left-over messages from pyPaSWAS (which indeed needs two files). Pacasus needs only one: python pacasus.py myFasta.fa -o processedReads.fa or python pacasus.py myFastQ.fq -1 fastq -o processedReads.fa should do the trick. I can recommend using the -L log.txt option for logging.

danshu commented 7 years ago

My test run exited with "ImportError: No module named pycuda.driver". When I try to install pycuda with"sudo pip install pyCuda", but still failed:

" gcc -pthread -fno-strict-aliasing -fwrapv -Wall -O3 -DNDEBUG -fPIC -DBOOST_PYTHON_SOURCE=1 -DHAVE_CURAND=1 -DPYGPU_PACKAGE=pycuda -DBOOST_THREAD_DONT_USE_CHRONO=1 -DPYGPU_PYCUDA=1 -DBOOST_MULTI_INDEX_DISABLE_SERIALIZATION=1 -DBOOST_THREAD_BUILD_DLL=1 -Dboost=pycudaboost -DBOOST_ALL_NO_LIB=1 -Isrc/cpp -Ibpl-subset/bpl_subset -I/usr/local/lib/python2.7/site-packages/numpy/core/include -I/usr/local/include/python2.7 -c src/cpp/cuda.cpp -o build/temp.linux-x86_64-2.7/src/cpp/cuda.o In file included from src/cpp/cuda.cpp:1:0: src/cpp/cuda.hpp:14:18: fatal error: cuda.h: No such file or directory compilation terminated. error: command 'gcc' failed with exit status 1

----------------------------------------

Command "/usr/local/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-Tr_ism/pyCuda/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-XYg2pU-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-Tr_ism/pyCuda/"

swarris commented 7 years ago

Could you tell me what type of hardware you have: CPU name and GPU vender? Default pacasus uses an NVIDIA GPU. For this you need to install the CUDA SDK (which contains the 'cuda.h'). After that you can install pyCUDA. You can also use an GPU supporting OpenCL, or your CPU. For this you need to install OpenCL and pyOpenCL. You need to use these options with pacasus: --device_type=CPU --platform_name=Intel --framework=opencl

For very long reads, >20kb (depending on your hardware), you need a CPU with OpenCL due to memory constraints on a GPU.

Pacasus will not run without OpenCL or CUDA (but it does not need both). See the readme for more information on this.

danshu commented 7 years ago

Here are the CPU information: vendor_id : GenuineIntel model name : Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz

danshu commented 7 years ago

I have installed opencl using: sudo apt update sudo apt install ocl-icd-opencl-dev sudo apt-get install python-pyopencl sudo pip install pyOpenCL

The new error message is: X server found. dri2 connection failed! beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) Traceback (most recent call last): File "/tools/assembler/Pacbio_tools/Pacasus/pacasus.py", line 12, in ppw.run() File "/tools/assembler/Pacbio_tools/Pacasus/pacasus/pacasusall.py", line 106, in run self._set_program() File "/tools/assembler/Pacbio_tools/Pacasus/pacasus/pacasusall.py", line 82, in _set_program self.program = Palindrome(self.logger, self.score, self.settings) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/Programs.py", line 357, in init Aligner.init(self, logger, score, settings) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/Programs.py", line 53, in init self.smith_waterman = SmithWatermanCPU(self.logger, self.score, settings) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/SmithWatermanOcl.py", line 320, in init SmithWatermanOcl.init(self, logger, score, settings) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/SmithWatermanOcl.py", line 38, in init self._set_platform(self.settings.platform_name) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/SmithWatermanOcl.py", line 112, in _set_platform raise RuntimeError('Failed to find platform') RuntimeError: Failed to find platform

swarris commented 7 years ago

Pacasus cannot connect to an OpenCL device. Your settings appear to be correct (--device_type=CPU --platform_name=Intel --framework=opencl) as it loads the SmithWatermanCPU class. I've just tested it on a server with Intel CPUs and no GPUs connected and it works fine. Silly question maybe, but did you restart you machine? In some cases this is required to activate the OpenCL drivers. Then, please run the example script from pyOpenCL: https://documen.tician.de/pyopencl/

danshu commented 7 years ago

Thanks! I will restart the server later next week because there are jobs running now.

Running the example script gives the following error: X server found. dri2 connection failed! Device open failed, aborting... Segmentation fault (core dumped)

I will try and post again after restarting the server.

swarris commented 7 years ago

Did you manage to get everything up and running?

danshu commented 7 years ago

Unfortunately I have jobs that have been running for weeks and I may need to wait until it finishes to restart our server.

swarris commented 7 years ago

I'm not sure a restart is always required: https://software.intel.com/en-us/forums/opencl/topic/390630 https://streamcomputing.eu/blog/2011-06-24/install-opencl-on-debianubuntu-orderly/

I can't remember if I needed to restart mine though.

danshu commented 7 years ago

Thanks! I have installed OpenCL form http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/ and the sam error occurs again.

X server found. dri2 connection failed! Device open failed, aborting... X server found. dri2 connection failed! X server found. dri2 connection failed! Device open failed, aborting... X server found. dri2 connection failed! beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) X server found. dri2 connection failed! Device open failed, aborting... X server found. dri2 connection failed! beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) cl_get_gt_device(): error, unknown device: ffffffff cl_get_gt_device(): error, unknown device: ffffffff Traceback (most recent call last): File "/tools/assembler/Pacbio_tools/Pacasus/pacasus.py", line 12, in ppw.run() File "/tools/assembler/Pacbio_tools/Pacasus/pacasus/pacasusall.py", line 106, in run self._set_program() File "/tools/assembler/Pacbio_tools/Pacasus/pacasus/pacasusall.py", line 82, in _set_program self.program = Palindrome(self.logger, self.score, self.settings) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/Programs.py", line 357, in init Aligner.init(self, logger, score, settings) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/Programs.py", line 53, in init self.smith_waterman = SmithWatermanCPU(self.logger, self.score, settings) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/SmithWatermanOcl.py", line 320, in init SmithWatermanOcl.init(self, logger, score, settings) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/SmithWatermanOcl.py", line 38, in init self._set_platform(self.settings.platform_name) File "/tools/assembler/Pacbio_tools/Pacasus/pypaswas/pyPaSWAS/Core/SmithWatermanOcl.py", line 112, in _set_platform raise RuntimeError('Failed to find platform') RuntimeError: Failed to find platform

swarris commented 7 years ago

It appears to try to connect to an GPU, not the CPU. The link you provide is to the SDK. Did you also install the drivers? https://software.intel.com/en-us/articles/opencl-drivers#latest_CPU_runtime

danshu commented 7 years ago

When I try to install the driver, it reports: Missing critical prerequisite -- The Beignet driver is detected The Beignet driver is installed. Please, remove it before proceeding the installation. Even after I have uninstalled Beignet.

swarris commented 7 years ago

Probably the driver is still loaded. I have no experience with beignet, but you could check with 'lsmod' which modules are present. If you find a beignet module, try to remove it with 'modprob -r'. Or a reboot might help :P ;-)

danshu commented 7 years ago

Thanks! I do not find beignet with 'lsmod' and may need to reboot later.