Closed Biofarmer closed 2 years ago
Hi, two ways should have the same result. With thousands of genomes, run each genome separately in parallel in computer cluster should be much faster.
OK, thanks. If running genome individually, should I just use 4 threads (-j 4) for each genome as the HMMSEARCH_THREADS is set to 4? With more threads (like -j 10), there is no effect on this hmm search step, and the total spent time should be almost same, right?
@jiarong Thanks. May I ask another questions?
I have installed virsorter2, and databases and dependencies by virsorter setup -d db -j 4 in a computer cluster. The computer cluster is managing with Slurm system, and when the job is submitted, there will be no internet available. May I ask whether any internet is needed when running the job when the databases and dependencies have been installed?
Another question about manually install databases and dependencies, I found that the dependencies in conda_envs is not installed but will be installed when running the first sample, which does not like virsorter setup -d db -j 4 that install dependencies in conda_envs along with downloading databases. May I ask if manually installing databases, whether the dependencies in conda_envs will be installed every time for each samples? or once installed, all rest samples will skip this step?
How many minutes it takes for test fa by running 'virsorter run -w test.out -i test.fa --min-length 1500 -j 4 all'. I run this code and changed 'virsorter config --set HMMSEARCH_THREADS=4', and it took 30 mins to finish. However, when inspecting the process by top, only one thread (100-150 in %CUP) seems to be used.
Thanks
-j 4
should be enough for a bacterial genome, and increasing threads wont have much improvement on speed.
1&2: Once db and dependencies are installed, internet connection is NOT needed, and dependency installation is skipped.
- Right,
-j 4
should be enough for a bacterial genome, and increasing threads wont have much improvement on speed. 1&2: Once db and dependencies are installed, internet connection is NOT needed, and dependency installation is skipped.
- This could happen. HMMER can be limited by things other than CPU. In computer clusters, it's likely the speed to read the data over network. It usually takes 10 - 20 mins on my server.
@jiarong Thanks. May I further confirm with you that "When manually installing databases and dependencies, I found that the dependencies in conda_envs is not installed but will be installed when running the first sample" is the right way how dependencies in conda_envs from db are installed when manually installing databases and dependencies?
Thanks
Dependencies can only be installed automatically. Usually, running the test example should have that done.
Dependencies can only be installed automatically. Usually, running the test example should have that done.
Yes, I run the test and found that dependencies have been installed automatically. But after running 'virsorter config --init-source --db-dir=./db' and before running test example, conda_envs in /db directory is empty and dependencies are not installed, right?
Correct.
Thanks and it is good to know more about the virsorter2. May I further confirm whether the manually downloaded database (https://osf.io/v46sc/download) is updated and completely same as the database by 'virsorter setup' command (virsorter setup -d db -j 4)? Thanks
@jiarong sorry, two following questions
Thanks
virsorter setup
.@jiarong Thank you very much!
Hi Jiarong,
I am using virsorter2 v2.2.3 for thousands of genomes to check virus sequences. I have a few questions before running:
Thanks Wang