xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

How to set number of threads? #19

Closed cousins closed 4 years ago

cousins commented 4 years ago

The end of constants.py file has:

# default number of threads to use in calculation if it is not given
nthread = 4
#nthread = 16
#nthread = 32

What is the way to set nthread for a given run? I have tried:

python3 isescan.py nthread 12 EC1.SPAdes.contigs.fa proteome hmm 

and

python3 isescan.py nthread=12 EC1.SPAdes.contigs.fa proteome hmm 

but it just shows:

usage: isescan [-h] [--version] seqfile path2proteome path2hmm
isescan: error: unrecognized arguments: proteome hmm

Thanks,

Steve

xiezhq commented 4 years ago

Hi Steve,

You need to set number of threads and/or processes in constants.py, the current version of ISEScan does not support setting it in command line (I will add this option to ISEScan soon, probably this weekend). I would suggest you to set number of processes you want to use, e.g. nproc = 4 or greater if your computer has 4 or more CPU cores. Setting number of threads in python does not make sense sometimes.

Xie

cousins commented 4 years ago

Hi Xie,

Thanks. I tried changing both nthread and ncpu and it has no effect as far as I can tell. It seems to set it at 4 no matter what.

Best,

Steve

On Tue, Apr 7, 2020 at 3:51 PM Zhiqun Xie notifications@github.com wrote:

Hi Steve,

You need to set number of threads and/or processes in constants.py, the current version of ISEScan does not support set it in command line (I will add this option to ISEScan soon, probably this weekend). I would suggest you to set number of processes you want to use, e.g. nproc = 4 or greater if your computer has 4 or more CPU cores. Setting number of threads in python does not make sense sometimes.

Xie

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xiezhq/ISEScan/issues/19#issuecomment-610586977, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDNXGMX2HD7QSKQWNJ7DZDRLN73HANCNFSM4MCNPOAQ .

cousins commented 4 years ago

Hi Xie,

It seems that maybe the changes in constants.py are not being incorporated. Do I need to do something to get them to take effect? I tried changing the value for tmpdir as well and that is not working. It is still trying to use /tmp which is too small for what we need. I tried changing it to /scratch.

Thanks,

Steve

On Tue, Apr 7, 2020 at 8:46 PM Stephen Cousins steve.cousins@maine.edu wrote:

Hi Xie,

Thanks. I tried changing both nthread and ncpu and it has no effect as far as I can tell. It seems to set it at 4 no matter what.

Best,

Steve

On Tue, Apr 7, 2020 at 3:51 PM Zhiqun Xie notifications@github.com wrote:

Hi Steve,

You need to set number of threads and/or processes in constants.py, the current version of ISEScan does not support set it in command line (I will add this option to ISEScan soon, probably this weekend). I would suggest you to set number of processes you want to use, e.g. nproc = 4 or greater if your computer has 4 or more CPU cores. Setting number of threads in python does not make sense sometimes.

Xie

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xiezhq/ISEScan/issues/19#issuecomment-610586977, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDNXGMX2HD7QSKQWNJ7DZDRLN73HANCNFSM4MCNPOAQ .

--


Steve Cousins Supercomputer Engineer/Administrator Advanced Computing Group University of Maine System 244 Neville Hall (UMS Data Center) (207) 581-3574 Orono ME 04469 steve.cousins at maine.edu

xiezhq commented 4 years ago

Hi Steve,

ISEScan.py can only utilize multiple CPU cores for FragGeneScan and HMMER which are two CPU-intensive steps (HMMER search is the speed-limit step in ISEScan) in ISEScan. The nproc parameter in constants.py is not for ISEScan users, it is for some testing codes when I developed ISEScan. I will remove nproc and those testing codes in ISEScan soon to avoid confusing users. I will update ISEScan today to add an optional nthread option for isescan.py to allow user to set number of CPU cores to use, and I will remove nthread and nproc paramters from constants.py accordingly.

ISEScan does not set temporary directory like /tmp/ you mentioned. If you run python3 isescan.py NC_012624.fna proteome hmm ISEScan will then write intermediate files and final results to three directories, proteome/, hmm/, prediction/. The final results are in prediction/, some intermediate/temporary files are written to proteome/ and hmm/. Could you give more details on 'It is still trying to use /tmp which is too small for what we need'? The Linux applications usually use /tmp/ for the temporary files because the environment varaible TMPDIR, TEMP or TMP is usually set to /tmp. So, you may try setting one of these environment varabiles to /scratch: export TMPDIR=/scratch/tmp4steve mkdir -p $TMPDIR

If you have lots of genome sequences (either multiple sequences in a genome or one sequence per genome), you can try running multiple isescan.py parallel on your computer as I described in 'How to run a set of genomes in a row' (https://github.com/xiezhq/ISEScan/issues/13).

Xie

cousins commented 4 years ago

Hi Xie,

I tried changing tmpdir in constants.py to /scratch instead of /tmp since our /tmp directory isn't big enough but ISEScan is still sending errors indicating that /tmp is running out of space. What do we need to do to change from /tmp to /scratch?

Thanks,

Steve

On Wed, Apr 8, 2020 at 9:35 PM Zhiqun Xie notifications@github.com wrote:

Hi Steve,

ISEScan.py cannot utilize multiple cpu cores as it is a python3 applications. The nproc and nthread parameters in constants.py are not for ISEScan users, they are for some testing codes in ISEScan. I will remove them soon.

ISEScan does not set temporary directory like /tmp/ you mentioned. If you run python3 isescan.py NC_012624.fna proteome hmm ISEScan will then write intermediate files and final results to three directories, proteome/, hmm/, prediction/. The final results are in prediction/, some intermediate/temporary files are written to proteome/ and hmm/.

If you have lots of genome sequences (either multiple sequences in a genome or one sequence per genome), you can try running multiple isescan.py parallel on your computer as I described in 'How to run a set of genomes in a row' (#13 https://github.com/xiezhq/ISEScan/issues/13).

Xie

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xiezhq/ISEScan/issues/19#issuecomment-611278876, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDNXGLPVC55UDKNSUZTCATRLUQ6HANCNFSM4MCNPOAQ .

--


Steve Cousins Supercomputer Engineer/Administrator Advanced Computing Group University of Maine System 244 Neville Hall (UMS Data Center) (207) 581-3574 Orono ME 04469 steve.cousins at maine.edu

cousins commented 4 years ago

Sorry, I didn't get your previous message in my email. I will try setting TEMP, TMP, and TMPDIR to see if that works.

xiezhq commented 4 years ago

ISEScan has been upgraded to v1.7.2.1, and you can conveniently install it by Bioconda (see https://github.com/xiezhq/ISEScan#Bioconda-install). The ISEScan >= v1.7.2 can set number of threads (GeneFragScan and hmmer called by ISEScan will utilize multiple threads and these two steps are the speed-limited steps in ISEScan) by command option --nthread, for example, isescan.py NC_012624.fna proteome hmm --nthread 2.

Xie