Open taylo5jm opened 5 years ago
I don't have a clear opinion on the matter. In the issue you're referring to #846 the problem is that you cannot interactively run multiprocessing on Windows. This had not occurred if the global configuration default was 1.
Then again, almost all modern computers have more than 1 core. So we might be wasting potential if we choose a default of 1 and users do not use the processes
argument of functions. Many applications these days use multiple cores, for example, Google Chrome is quite greedy.
So I see two options:
max(2, cpu_count() // 2)
is a bit nicer). We can additionally catch the specific error that occurred in #846 and provide a more helpful message that alerts a user to the processes
argument and multiprocessing in interactive environments.I understand the perspective that it could be a waste of potential to set the default processes
value to 1. I know some people that have moved away from Chrome due to its greedy behavior. On the other hand, I am sure that I would be frustrated if Chrome was constrained to 1 CPU.
Thinking about this some more, I'm realizing that the choice of default will be somewhat arbitrary. Nonetheless, half of the maximum number of cores seems like a more reasonable number than the maximum. In the absence of knowledge about the parallel behavior, the user would (hopefully) not be bogging down the machine nor drastically underutilizing CPU.
I think your suggestion of handling the specific Windows error is elegant and solves the immediate problem that led me to open this issue. I am happy to take a stab at implementing the error handling.
Yes, please feel free to go ahead with a PR :slightly_smiling_face:
I would even go for cores - 1
, since you usually want as much oomph for the computation as possible without freezing your system. Memory requirements are also something to consider. Multiprocessing pickles the entire environment and reloads into each new process. For larger models this can lead to significant memory usage.
What's your experience here @cdiener? With parallel FVA for example, I see near linear in performance up to 4 processes (the physical number of cores on my machine) after that I see sub-linear gains. Due to Intel's hyperthreading the number reported by cpu_count()
on my system is 8. So for me, the above //2
is actually the sweet spot without locking down my entire system.
For server CPUs I see speed up even for hyperthreads. For desktop systems I still do but its definitely less after you reach the real number of CPUs. But that is a good point, even though I would not like to lock in the hyperthreads logic into cobrapy since that only applies to a particular set of CPUs.
Apparently, psutil.cpu_count(logical=False)
is the physical number of cores. I'm not sure it's worth adding psutil as a dependency just for this.
A separate point is that we should also care for the number of CPUs allotted to the program (len(os.sched_getaffinity(0))
). Especially on clusters, this is likely different.
Hi all, We have to ask what our users are:
For [1] we should have a setup which is at a sweet spot without freezing the system. I prefer here number_of_cpus-1
. For [2] it doesn't matter, because they will set the maximum number of cpus according to available system resources.
We could provide a warning from within the parallized algorithms which would just check if the cpu_count
is the default value of number_of_cpus-1
, e.g.
Using the default number of processes `number_of_cpus-1 for running `{parallelizable_algorithm_name}.
You can change this in `configuration`.
Just to chime in on this, there's a non-negligible overhead in invoking the multiprocessing pool; at least there appears to be on my recent macbook pro installation of cobrapy.
Configuration().processes
gets set to 8 by default.
For a fairly small model, I'm getting the following results:
%timeit flux_variability_analysis(model, reaction_list=model.exchanges)
18.4 s ± 263 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=1)
5.71 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
There definitely is a sizeable overhead involved in creating multiple processes. That is to be expected. Especially, if your number of exchange reactions there is rather small. 18 seconds seems extremely long, though. Do you have any suggestions? Should we create a warning? I typically don't like to try to be too smart, i.e., I wouldn't like to automatically decide the pool size based on the number of reactions passed.
P.S.: What is the time required if you set processes=4
which I guess is the number of your physical cores rather than hyperthreads.
In my experience the multiprocess overhead is usually due to the initial pickling of your entire environment and sending this to the new processes. So it is important to run those benchmarks in an empty Python environment. For reference I do see that on my Mac as well but with not that large of a discrepancy, and it becomes negligible for larger models.
In [1]: from cobra.test import create_test_model
In [2]: from cobra.flux_analysis import flux_variability_analysis
In [3]: model = create_test_model("textbook")
In [4]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=6)
254 ms ± 2.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [5]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=1)
23.6 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: model = create_test_model("ecoli")
In [7]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=6)
311 ms ± 45.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=1)
767 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In the current version of cobrapy, the default
processes
attribute of theConfiguration
object is often set to the output ofmultiprocessing.cpu_count
function. This behavior results in setting theprocesses
attribute to the maximum number of CPUs that are available on a machine.I think this behavior can be convenient for some, but could be confusing to others. For example, I have seen a couple of questions recently that seem to have stemmed from differences in parallel processing between Windows and Unix systems.
It is possible to load and analyze a model without any knowledge of the configuration object, so the parallel behavior can be opaque to the end-user, which can make debugging more difficult for those that are new to Python and/or cobrapy.
From my perspective, using the maximum number of CPUs by default is a bit unconventional. I am thinking of
mclapply
andforeach
inR
.mclapply
is a function that allows one to apply a function over a vector and the default number of cores is 2.foreach
requires one to register a parallel backend with an explicit number of cores. If a user decides to use the%dopar%
construct without registering a parallel backend, then only one core is used.Thoughts? I am happy to make PR if the group is interested in changing the behavior.