joblib / loky

Robust and reusable Executor for joblib
http://loky.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
528 stars 45 forks source link

loky backend confuses timings #371

Closed Barry1 closed 2 years ago

Barry1 commented 2 years ago

When using joblib with loky backend (and I expect whenever using loky) the child-processes would not be monitored by timing-methods. It is best explained with the following mwe

import joblib
import os

def blocking_io():
    # File operations (such as logging) can block the
    # event loop: run them in a thread pool.
    with open("/dev/urandom", "rb") as f:
        return f.read(100)

def evaltiming(before: os.times_result, after: os.times_result) -> None:
    print(before)
    print(after)
    diff: list[float] = [a - b for (a, b) in zip(after, before)]
    print(f"{diff[0]+diff[1]} own compute seconds")
    print(f"and {diff[2]+diff[3]} children compute seconds")
    print(f"in {diff[4]} real seconds.")
    print(f"{100 * sum(diff[:4]) / diff[4]} % CPU load")

if __name__ == "__main__":
    print(8 * "==========")
    for bckend in ("loky", "multiprocessing", "threading"):
        print(f"Running with {bckend}")
        before: os.times_result = os.times()
        joblib.Parallel(n_jobs=4, backend=bckend)(
            joblib.delayed(blocking_io)() for _ in range(1000000)
        )
        evaltiming(before, os.times())
        print(8 * "==========")

This tests and times the three different backends with an io-blocking function. The loky details show, that something could be correct especially in comparison to multiprocessing.

I think, the child-processes are launched in a away, that os.times (and others like ressource and further) do not registrate them as children.

When running the mwe from linux console with time it is working. Could that behaviour be changed so, that the loky times are right?

Barry1 commented 2 years ago

I found a way to get stable timings. As loky leaves the worker pool open for a time, the childrens-timing-values where not collected into the calling process. Thus two ways:

  1. summing up over childrens timings (maybe with the help of psutil) or
  2. closing the workers. I found in https://github.com/joblib/joblib/issues/945 to do it with the command get_reusable_executor().shutdown(wait=True)

I personally went the way 2. But I'd appreciate if it would have something like a "really_shutdown"-method or something like that.