jts / nanocorrect

Experimental pipeline for correcting nanopore reads
MIT License
39 stars 10 forks source link

POA runs as a single thread? #7

Closed hjjansen closed 9 years ago

hjjansen commented 9 years ago

Hi, I've seen the your excellent presentation at London Calling and now want to run the nanocorrect pipeline on MinION data from a yeast strain. Installation wasn't a problem and DALIGNER runs fine and produces some sensible output. Also the nanocorrect.py script runs fine. The only problem is that poa runs as a single thread and it will take a long time to process all 26489 reads. A test showed that it took 3 hrs to process 300 reads on my system. I'm just learning to work with Python (I'm very wet as Ewan calls it), but in the nanocorrect.py script I see at line 155 and 156 this code: cmd = "poa -read_fasta %s -clustal %s -hb %s" % (in_fn, out_fn, blosum_file) p = subprocess.Popen(cmd, shell=True, stderr=DEVNULL) Am I correct in assuming that this should start multiple instances of poa? If so, have you got any pointers where to look to solve this problem? If not, is it possible to run poa on multiple threads? My system runs Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-40-generic x86_64) Python version: Python 2.7.6 Compiler: [GCC 4.8.2]

nickloman commented 9 years ago

Hi Hans - we usually parallelise this step using GNU parallel as it is slow as you say. The pipeline script we developed shows how we do it: https://github.com/jts/nanopore-paper-analysis/blob/master/full-pipeline.make

Basically you split the file up using makerange.py like this, for example if you were using 32 threads:

python makerange.py reads_to_correct.fasta > ranges.txt cat ranges.txt | parallel -P 32 python nanocorrect.py reads_to_correct > reads_to_correct.corrected.fasta

hjjansen commented 9 years ago

Many thanks Nick,

That worked like a charm. Shame on me: next time I will RTFM.