Open tomMoral opened 1 year ago
This sounds awesome! Hoping that freeze_support
can finally be added to loky
and work with pyinstaller
. Thank you so much @tomMoral ☺️
Are we still needed here? I can probably try to get it running, although I scrapped my original windows vm since then, but we now have github actions instead, so probably something to do there. Can't remember if it was also a problem on other platforms too, but that seems likely.
I think we need an integration test that runs a minimal loky-based program through pyinstaller and check that it runs correctly.
And we also need a docstring for this: https://github.com/joblib/loky/pull/375/files#r1061777509.
I tried this PR with pyinstaller on the following test program named frozen_loky.py
):
import loky
if __name__ == "__main__":
loky.freeze_support()
e = loky.get_reusable_executor()
print(sum(e.map(lambda x: x ** 2, range(1000))))
$ pyinstaller --version
5.8.0
$ pyinstaller frozen_loky.py
$ ./sdist/frozen_loky/frozen_loky
332833500
$ 332833500
332833500
332833500
332833500
332833500
332833500
[...]
and it creates many subprocesses that continuously output the result on stdout even after the parent process has completed (I had to pkill frozen_loky
to terminate them all).
I tried both at 95ae81c and after the merge of the master
branch at 00c8ded and got the same behavior in both cases so I think the problem does not comes.
This is running:
$ python -V
Python 3.10.2
on Linux. Note: I cannot get pyinstaller to work on my local macOS machine because of the problem with the codesign
command.
Hum, I re-read the original description of the issue and maybe parsing the command line args to raise the RuntimeError
is actually needed? This seems not very user friendly.
Hum, I re-read the original description of the issue and maybe parsing the command line args to raise the RuntimeError is actually needed? This seems not very user friendly.
I tried just that and could not get it to work either. Or maybe it's working as expected but the error messages on stderr are not expected.
Thanks @tomMoral! I confirm that the latest commit make my minimal snippet from https://github.com/joblib/loky/pull/375#issuecomment-1438287445 work fine!
Summary of the failures of my new test:
frozen_loky
executable generated by pyinstaller. I can reproduce that when I execute the test in tox locally but the same test works when running the test outside of tox (with pytest directly in my conda env with all the test dependencies manually installed). I get extra warnings when running the pyinstaller command (using the -s
flag via tox -e py310 -- -s -k test_freeze_support_with_pyinstaller
) that seem to point that pyinstaller has trouble to generate the froze executable when executed from with the tox managed venv nested under the CI conda env. Need to investigate to know the details,codesign
command, probably a bug of pyinstaller
,FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\VssAdministrator\\Application Data\\pyinstaller'
File "/home/vsts/work/1/s/.tox/pypy3/site-packages/PyInstaller/depend/bytecode.py", line 36, in _instruction_to_regex
return re.escape(bytes([opmap[x]]))
KeyError: 'LOAD_METHOD'
I will try to see if I can link some of those problems to known issues in the pyinstaller issue tracker.
A few things from experience:
Thanks for the feedback. Note that the warnings I mentioned above when running pyinstaller under tox under conda env do not show up on the CI, yet we still get the segfaults under some macos and linux jobs.
So the warnings are probably harmless and unrelated. I will try to craft a few minimal CI envs to specifically run this test and remove the pyinstaller dependency from the tox.ini
file so that it's not run from the other envs.
The Windows test seems to have uncovered a real problem in the concurrent future queues:
loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "loky\process_executor.py", line 426, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "multiprocessing\queues.py", line 108, in get
PermissionError: [WinError 5] Access is denied
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "frozen_loky.py", line 7, in <module>
File "loky\process_executor.py", line 962, in _chain_from_iterable_of_lists
for element in iterable:
File "concurrent\futures\_base.py", line 621, in result_iterator
File "concurrent\futures\_base.py", line 319, in _result_or_cancel
File "concurrent\futures\_base.py", line 403, in __get_result
loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
In 27e0ad7, I changed the frozen program to use concurrent.futures.ProcessPoolExecutor
and multiprocessing.freeze_support
and it works without error on all 3 supported platforms.
The error observed under windows look similar to what was fixed by @pierreglaser in #216 but comparing with the way the executable is introspected in the current version of those files (spawn.py, popen_loky_win32.py and resource_tracker.py) I cannot spot the discrepancy (but I did not attempt to perform a full code diff either).
I reverted my last two commits, I don't have a windows machine at hand so it's really painful to debug via the CI. I should try to setup a windows vm on another local machine but I won't have time today unfortunately.
The macos failure (TestResourceTracker.test_resource_tracker[folder]
) is random an unrelated to this PR.
For information, I tried to reproduce locally on a Windows VM and I can reproduce both with Python 3.11, 3.10 and 3.9 so it's not specific to a given Python version (as I suspected after seeing the seemingly related failure in #389) but rather a bug in loky itself.
Some of the latest changes cause the resource tracker start-up to crash on all windows jobs:
Traceback (most recent call last):
File "X:\loky\loky\backend\spawn.py", line 283, in main
exitcode = _main(fd, parent_sentinel=parent_sentinel)
File "X:\loky\loky\backend\spawn.py", line 301, in _main
prepare(preparation_data)
File "X:\loky\loky\backend\spawn.py", line 195, in prepare
_resource_tracker._fd = msvcrt.open_osfhandle(handle, 0)
OSError: [Errno 9] Bad file descriptor
Yes this is expected. The strategy put in place by multiprocessing
is to not use inheritance
anymore but to use duplication.
I changed it for the main process but I also need to adapt the ressource_tracker
strategy now that I understood the inheritance. hopefully I will get the time to investigate this by tomorrow.
After (too much) investigation, I think this is actually related to pyinstaller
and the way the frozen programs are created that mess around with the permission of the worker process (see pyinstaller/pyinstaller#7183 and https://github.com/pyinstaller/pyinstaller/blob/develop/PyInstaller/hooks/rthooks/pyi_rth_multiprocessing.py)
Not sure how to fix this yet...
It's weird, I tried to use concurrent.futures.ProcessPoolExecutor
/ multiprocessing.freeze_support
in the pyinstaller test of this PR and it passes. There must be something specific to the spawn variant implemented in loky that causes this problem.
For information, I tried to see if the refactoring in this PR would potentially fix the SIGSEGV(-11)
crash observed on macOS in #388 but unfortunately this does not solve it.
It's weird, I tried to use
concurrent.futures.ProcessPoolExecutor
/multiprocessing.freeze_support
in the pyinstaller test of this PR and it passes. There must be something specific to the spawn variant implemented in loky that causes this problem.
multiprocessing
is monkeypatched but pyinstaller
but loky
is not so it is not weird that it fails on loky
.
The complex thing is, I don't get what part of the monkey patching I need ^^. I tried a bit but I had to move on.
If I confirm this; then, we can xfail this test and add a hook
for loky in pyinstaller to fix this and move on. But we have to make sure this is the case first...
+1 for opening an issue upstream on the pyinstaller issue tracker and xfailing the windows test case in loky in the mean time.
Please update the changelog accordingly to document that freeze support for windows is still a work in progress.
@tomMoral CI trigger
I believe you meant to merge or rebase with current master
no?
Actually no need, because the latest CI config of master
is used in the PR even if there are not up to date with master
.
Unfortunately, this PR does not fix the TestsProcessPoolSpawnExecutor.test_max_depth[True]
failure observed for Windows and Python 3.11.
There is another, maybe related failure in test_kill_process_tree
. I opened #392.
Actually, the test_kill_process_tree
failure in this PR is different from the one on master (as reported in #392). Here is the traceback:
================================== FAILURES ===================================
________________________ test_kill_process_tree[True] _________________________
use_psutil = True
@pytest.mark.parametrize("use_psutil", [True, False])
def test_kill_process_tree(use_psutil):
psutil = pytest.importorskip("psutil")
event = ctx_loky.Event()
p = ctx_loky.Process(target=_run_nested_delayed, args=(4, 1000, event))
p.start()
# Wait for all the processes to be launched
if not event.wait(30):
kill_process_tree(p, use_psutil=use_psutil)
> raise RuntimeError(
"test_kill_process_tree was not able to launch "
"all nested processes."
)
E RuntimeError: test_kill_process_tree was not able to launch all nested processes.
event = <Event at 0x20e3e77e9d0 unset>
p = <LokyProcess name='LokyProcess-20' pid=1792 parent=7720 stopped exitcode=15>
psutil = <module 'psutil' from 'D:\\a\\1\\s\\.tox\\py311\\Lib\\site-packages\\psutil\\__init__.py'>
use_psutil = True
tests\test_loky_backend.py:721: RuntimeError
---------------------------- Captured stdout call -----------------------------
--------------------------------------------------------------------------------
None failed with traceback:
--------------------------------------------------------------------------------
Traceback (most recent call last):
File "D:\a\1\s\loky\backend\spawn.py", line 298, in main
exitcode = _main(fd, parent_sentinel=parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\a\1\s\loky\backend\spawn.py", line 317, in _main
self = pickle.load(from_parent)
^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input
This is an attempt to fix
loky
for frozen support for all plateforms. (cc #236 , joblib/joblib#1002). (I got nerd snipped :sweat_smile: )This basically adds a
freeze_support
function that allows starting the worker with the right executable if it is called right afterif __name__ == "__main__":
. This PR also cleanups and refactor the command line generation, to avoid some duplicated code.I used the following scripts to test and reproduce the error in pure
loky
:this gives me the output:
Note that the error is due to the fact that actually, the resource tracker from
cpython
is also broken. I am not totally sure why it is actually started but this is not a problem for running the code. If the code from this PR actually fixes the issue, it could be ported to python to make the frozen support working with all platforms/start methods and with the resource tracker.As a disclaimer, I did not test this on windows and I would be very glad to have some feedbacks from the original reporters. Maybe from @samuelstjean or @gdoras if you are still around. Sorry for the slow response....
TODO before merge: