xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

How to run a set of genomes in a row #13

Closed manuss11 closed 4 years ago

manuss11 commented 4 years ago

Hi, I'm using ISEScan, but I'm newby in linux and I don't know how to run a set of 200 genomes in fasta format in one single script. I tried: python3 /home/qiime2/ISEScan-1.7/isescan.py *.fa proteome hmm

But it doesn't seem correct,

Many thanks

xiezhq commented 4 years ago

Thanks for your interest in ISEScan.

Two questions before we go to the next step to solve your issue.

  1. Is each fasta file a genome?
  2. Are you submitting each ISEScan computing job to your Linux cluster system or just a independent Linux server/workstation (like your desktop/laptop but more powerful)?

Thanks, Xie

xiezhq commented 4 years ago

I am trying to answer your questions based on a few assumptions:

Now, let's run 200 genomes in one line of command and then wait for all computing jobs to complete (probably several days or weeks, depending on how many hours are required for each of your 200 genomes in average). If your computer has 8 CPU cores and You can execute the command below: nohup cat test.fna.list | xargs -n 1 -P 4 -I{} python3 /home/qiime2/ISEScan-1.7/isescan.py {} proteome hmm > log.txt &

In the command line,

It might take several days or weeks for 200 genomes to complete. It depends on how many CPU cores you have on your computer and how fast each CPU core is. Please do not load too many ISEScan jobs because each ISEScan job will consume part of your RAM on your computer. However, you can always test and estimate how many GB RAM and how many hours are required for a genome.

Hope it helps.

Xie

manuss11 commented 4 years ago

Thanks a lot Xie, that is just what I needed. Each genome is in a multi-fasta file, and I'm using a computer with 8 cores and 32gb RAM. My priority was to run all genomes using just one script. Parallel working was not my goal itself, but it's good to know. Many thanks!