jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
201 stars 28 forks source link

the program seemed to stay at step 3 #33

Open LiuPeng-nju opened 3 years ago

LiuPeng-nju commented 3 years ago

Hi, I ran the program with the command virsorter run --prep-for-dramv -w DNA.results.out -i /data/LiuPeng/DNA.results/contigs.fa -j 110 all and got the following results,

[2020-12-25 11:19 INFO] # of seqs < 0 bp and removed: 0 [2020-12-25 11:19 INFO] # of circular seqs: 9682 [2020-12-25 11:19 INFO] # of linear seqs : 3076056 [2020-12-25 11:19 INFO] Finish spliting circular contig file with common rbs [2020-12-25 11:19 INFO] Finish spliting circular contig file with NCLDV rbs [2020-12-25 11:20 INFO] Finish spliting linear contig file with common rbs [2020-12-25 11:20 INFO] Finish spliting linear contig file with NCLDV rbs [2020-12-25 11:24 INFO] Step 1 - preprocess finished. [2020-12-26 04:13 INFO] Step 2 - extract-feature finished.

There are ~ 3 M contigs in my fasta file, and the time-cosuming Step 2 cost ~ 17 hours. But the program has stayed in step 3 for more than 2 days.

I also used htop to check the CPUs and memory usage, and found that provirus.py is running (CPU 100% but memory 0%)

So, what 's the possible cause? Thank you.

jiarong commented 3 years ago

I am not sure. CPU 100% means the process is running, not stuck. How many provirus.py processes are there in htop? Are most of contigs very long (> 100kbp)?

LiuPeng-nju commented 3 years ago

Thanks for your reply. Indeed, the process is still running and has jumped to another python script. But there are only 153 contigs longer than 50 kbp.

jiarong commented 3 years ago

Right, these contigs are not that long. With 110 threads (-j), step 3 should finish within hours. Are you running other processes takes many CPUs threads?