joblib / loky

Robust and reusable Executor for joblib
http://loky.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
528 stars 45 forks source link

Loky test suite is much slower on macOS than on Linux or Windows #401

Closed ogrisel closed 1 year ago

ogrisel commented 1 year ago

For instance on:

The macOS runs (install + test + results upload) take around 30 min while the Windows and Linux builds run in 7 min.

But it could be the case that the hardware is not comparable (or that the macOS VMs are more oversubscribed and run slower generally than the Windows and Linux VMs on Azure).

On my local macOS (M1) machine, the test suite run in 7 min which is not too bad but the same test suite runs in 3 min on the same machine within a linux docker container.

So there is something macOS specific that makes loky run 2x slower on that OS.

I think we need to isolated an minimal case where loky macOS is significantly slower than linux and then check whether this is also the case with concurrent.futures.ProcessPoolExecutor and the spawn start method.

ogrisel commented 1 year ago

Something weird, I just observed on a running macos build that pytest has finished running 100% of the tests in 17 min (which is already quite slow) but then waits several minutes joining a subprocess in a the atexit finalizers in MainProcess:MainThread.

=========== 292 passed, 22 skipped, 1 warning in 1028.79s (0:17:08) ============
[INFO:MainProcess:MainThread] process shutting down
[DEBUG:MainProcess:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:MainProcess:MainThread] telling queue thread to quit
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
[INFO:MainProcess:MainThread] calling join() for process LokyProcess-1334

Here is the link of the build:

https://dev.azure.com/joblib/loky/_build/results?buildId=2984&view=logs&j=07acc468-c4ed-5e5d-e6de-dce332f6ba5b&t=c1b52095-0117-5f22-bf8e-30aa6f5be072

ogrisel commented 1 year ago

This also happened on this run (the merge of #394 in the master branch):

The tests ran in 20 min according to pytest but then an extra 10 minutes were wasted in joining LokyProcess instances in the atexit finalizers.

ogrisel commented 1 year ago

Actually this can also happen on Windows and even Linux (more rarely).

This is a duplicate of #397 and it's being investigated in #399.