Open widdowquinn opened 5 years ago
Thanks for the report, Sarah. I can't seem to reproduce the behaviour you see. For me, using:
pyani
v0.2.7 (the current master
branch)mummer
v3.1python
3.6.6and executing the command described in #20:
average_nucleotide_identity.py -v -f \
-i tests/test_ani_data/ \
-o tests/test_ANIm_output/ \
-g --gformat png,pdf,eps \
--classes tests/test_ani_data/classes.tab --labels tests/test_ani_data/labels.tab \
--workers 1
I see only one mummer
process.
With the alternative command:
average_nucleotide_identity.py -v -f \
-i tests/test_ani_data/ \
-o tests/test_ANIm_output/ \
-g --gformat png,pdf,eps \
--classes tests/test_ani_data/classes.tab --labels tests/test_ani_data/labels.tab \
--workers 4
I see four mummer
processes.
For me, --workers
seems to work as expected.
Can you please describe your system and how you're running the command?
I am using pyani 0.2.7
on CentOS Linux 7 (Core)
with mummer
4.0.0beta2
.
This is the command I am running:
average_nucleotide_identity.py -i test/ -o out --workers 3
There are 16 nucleotide sequences in test
. Although I expect mummer to be run on 3 cores, it uses the maximum of available cores.
Hi Sarah,
The --workers
option governs the number of concurrent mummer
jobs, which only indirectly determines the number of cores that mummer
uses. mummer
3 is single-threaded, but mummer
4 is multithreaded.
As a user, on your multicore machine would you prefer to run a single mummer
job using all available cores, or multiple single-threaded mummer
jobs?
L.
I see! I will try it with mummer
3 then.
For me, specifying a number of cores is something that makes a program flexible for running it on e.g. a cluster., and therefore essential for my working environment. My preferred option would be one job on multiple cores.
Thanks!
As a sensible change to pyani
behaviour, I think I'll have to test for which version of mummer
is present, then do one of the following:
mummer
4 to single-threadingmummer
4 to a specified number of workers/coresmummer
do its own thingFor a cluster like our local cluster (which is a common setup), I'm not sure it's straightforward to tailor the total number of cores requested to suit those available on a node, but we can specify a minimum number of free cores for each single job (though arraying those jobs may add a layer of complexity). It's something I'll have to look into for the next (impending) version.
My feeling is that the ultimate control will be in the hands of the user, who will have to use pyani
parameters to somehow balance:
mummer
processto their best advantage. Right now I'm targeting either an SGE-like scheduler, or local multicore, as these are what I have available to me. Any advice on how better to manage this is welcome.
@widdowquinn Please advise as to the relevancy of this issue in the current state of the repo. This may play in to some of the batching changes that need to be made, for instance.
This is relevant to the current state of the repo, and does, as you note relate to how we move to a new SLURM-friendly batching approach.
The key issue here, I think, is how much control we provide the user over how jobs are distributed. This requires us to take into account how the underlying tool that is called distributes its jobs.
As described above, we could rely on mummer3
running a single job on a single thread/core per comparison. This appears not to be the case with mummer4
, which appears to run the required number of simultaneous comparisons, but "spreads out" threadwise to use as much processing capability as is possible.
So, when we detect which version of mummer
is in place on the user's machine, we need to be able to generate the appropriate command to generate the consistent behaviour we want.
If we want to enforce behaviour such that --workers
controls the number of comparison jobs and makes this equal to the number of cores that are used (the current implementation), we will have to use distinct command-lines for mummer3
and mummer4
to manage this. If we want to put that control in the hands of the user, we'll have to modify the CLI to give the user that extra control.
Originally posted by @sarah872 in https://github.com/widdowquinn/pyani/issues/20#issuecomment-440969428