opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
460 stars 216 forks source link

Should the default processes value be the maximum number of CPUs on a machine? #847

Open taylo5jm opened 5 years ago

taylo5jm commented 5 years ago

In the current version of cobrapy, the default processes attribute of the Configuration object is often set to the output of multiprocessing.cpu_count function. This behavior results in setting the processes attribute to the maximum number of CPUs that are available on a machine.

I think this behavior can be convenient for some, but could be confusing to others. For example, I have seen a couple of questions recently that seem to have stemmed from differences in parallel processing between Windows and Unix systems.

It is possible to load and analyze a model without any knowledge of the configuration object, so the parallel behavior can be opaque to the end-user, which can make debugging more difficult for those that are new to Python and/or cobrapy.

From my perspective, using the maximum number of CPUs by default is a bit unconventional. I am thinking of mclapply and foreach in R. mclapply is a function that allows one to apply a function over a vector and the default number of cores is 2. foreach requires one to register a parallel backend with an explicit number of cores. If a user decides to use the %dopar% construct without registering a parallel backend, then only one core is used.

Thoughts? I am happy to make PR if the group is interested in changing the behavior.

Midnighter commented 5 years ago

I don't have a clear opinion on the matter. In the issue you're referring to #846 the problem is that you cannot interactively run multiprocessing on Windows. This had not occurred if the global configuration default was 1.

Then again, almost all modern computers have more than 1 core. So we might be wasting potential if we choose a default of 1 and users do not use the processes argument of functions. Many applications these days use multiple cores, for example, Google Chrome is quite greedy.

So I see two options:

  1. Be conservative and set a global default of one. Solves all issues but also limits potential.
  2. Set a value great than one. We can now discuss what that value should be (maybe something like max(2, cpu_count() // 2) is a bit nicer). We can additionally catch the specific error that occurred in #846 and provide a more helpful message that alerts a user to the processes argument and multiprocessing in interactive environments.
taylo5jm commented 5 years ago

I understand the perspective that it could be a waste of potential to set the default processes value to 1. I know some people that have moved away from Chrome due to its greedy behavior. On the other hand, I am sure that I would be frustrated if Chrome was constrained to 1 CPU.

Thinking about this some more, I'm realizing that the choice of default will be somewhat arbitrary. Nonetheless, half of the maximum number of cores seems like a more reasonable number than the maximum. In the absence of knowledge about the parallel behavior, the user would (hopefully) not be bogging down the machine nor drastically underutilizing CPU.

I think your suggestion of handling the specific Windows error is elegant and solves the immediate problem that led me to open this issue. I am happy to take a stab at implementing the error handling.

Midnighter commented 5 years ago

Yes, please feel free to go ahead with a PR :slightly_smiling_face:

cdiener commented 5 years ago

I would even go for cores - 1, since you usually want as much oomph for the computation as possible without freezing your system. Memory requirements are also something to consider. Multiprocessing pickles the entire environment and reloads into each new process. For larger models this can lead to significant memory usage.

Midnighter commented 5 years ago

What's your experience here @cdiener? With parallel FVA for example, I see near linear in performance up to 4 processes (the physical number of cores on my machine) after that I see sub-linear gains. Due to Intel's hyperthreading the number reported by cpu_count() on my system is 8. So for me, the above //2 is actually the sweet spot without locking down my entire system.

cdiener commented 5 years ago

For server CPUs I see speed up even for hyperthreads. For desktop systems I still do but its definitely less after you reach the real number of CPUs. But that is a good point, even though I would not like to lock in the hyperthreads logic into cobrapy since that only applies to a particular set of CPUs.

Midnighter commented 5 years ago

Apparently, psutil.cpu_count(logical=False) is the physical number of cores. I'm not sure it's worth adding psutil as a dependency just for this.

A separate point is that we should also care for the number of CPUs allotted to the program (len(os.sched_getaffinity(0))). Especially on clusters, this is likely different.

matthiaskoenig commented 5 years ago

Hi all, We have to ask what our users are:

For [1] we should have a setup which is at a sweet spot without freezing the system. I prefer here number_of_cpus-1. For [2] it doesn't matter, because they will set the maximum number of cpus according to available system resources. We could provide a warning from within the parallized algorithms which would just check if the cpu_count is the default value of number_of_cpus-1, e.g.

Using the default number of processes `number_of_cpus-1 for running `{parallelizable_algorithm_name}. 
You can change this in `configuration`.
pstjohn commented 4 years ago

Just to chime in on this, there's a non-negligible overhead in invoking the multiprocessing pool; at least there appears to be on my recent macbook pro installation of cobrapy.

Configuration().processes gets set to 8 by default. For a fairly small model, I'm getting the following results:

%timeit flux_variability_analysis(model, reaction_list=model.exchanges)
18.4 s ± 263 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=1)
5.71 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Midnighter commented 4 years ago

There definitely is a sizeable overhead involved in creating multiple processes. That is to be expected. Especially, if your number of exchange reactions there is rather small. 18 seconds seems extremely long, though. Do you have any suggestions? Should we create a warning? I typically don't like to try to be too smart, i.e., I wouldn't like to automatically decide the pool size based on the number of reactions passed.

Midnighter commented 4 years ago

P.S.: What is the time required if you set processes=4 which I guess is the number of your physical cores rather than hyperthreads.

cdiener commented 4 years ago

In my experience the multiprocess overhead is usually due to the initial pickling of your entire environment and sending this to the new processes. So it is important to run those benchmarks in an empty Python environment. For reference I do see that on my Mac as well but with not that large of a discrepancy, and it becomes negligible for larger models.

In [1]: from cobra.test import create_test_model

In [2]: from cobra.flux_analysis import flux_variability_analysis

In [3]: model = create_test_model("textbook")

In [4]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=6)
254 ms ± 2.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=1)
23.6 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: model = create_test_model("ecoli")

In [7]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=6)
311 ms ± 45.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit flux_variability_analysis(model, reaction_list=model.exchanges, processes=1)
767 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)