indygreg / PyOxidizer

A modern Python application packaging and distribution tool
Mozilla Public License 2.0
5.49k stars 239 forks source link

Windows multiprocessing spawn error #698

Open bengetch opened 1 year ago

bengetch commented 1 year ago

While using the multiprocessing library on Windows, I get the following error:

Traceback (most recent call last):
  File "multiprocessing.spawn", line 107, in spawn_main
  File "multiprocessing.reduction", line 79, in duplicate
TypeError: DuplicateHandle() argument 2 must be int, not dict

The code I'm using is roughly:

with multiprocessing.Pool(
            processes=<int>, initializer=<func>, initargs=[<bytes>, ]
        ) as pool:
            output = pool.map(
                <function>,
                tqdm(tqdm(<list<str>>, fh=<file_handler>))
            )

The use of tqdm with a file handler is just so that progress logging can be written to file, and replacing it with an ordinary list yields the same error. The code itself builds and executes just fine on linux and macOS (both Intel and ARM).

I've tried various configurations suggested here without success. Any insight as to what could be happening? Thanks!

bengetch commented 1 year ago

Update:

I've narrowed the issue down further, and the following code snippet produces the exact same error:

def update_func(v):
    return v * 2

def multiproc_minimal():
    with mp.Pool(processes=2) as pool:
        results = pool.map(update_func, [1, 2, 3, 4])

    print(results)

if __name__ == "__main__":
    multiproc_minimal()

For reference, my pyoxidizer.bzl file looks like the following:

def make_exe():

    dist = default_python_distribution(python_version="3.9")
    policy = dist.make_python_packaging_policy()

    python_config = dist.make_python_interpreter_config()
    python_config.run_filename = "encryptor.py"

    exe = dist.to_python_executable(
        name="encryptor",
        packaging_policy=policy,
        config=python_config,
    )

    # the code is in a file called encryptor.py
    exe.add_python_resources(exe.read_package_root(
        path="./",
        packages=["encryptor"]
    ))
    return exe

def make_embedded_resources(exe):
    return exe.to_embedded_resources()

def make_install(exe):
    files = FileManifest()
    files.add_python_resource(".", exe)
    return files

def make_msi(exe):
    return exe.to_wix_msi_builder(
        "myapp",
        "My Application",
        "1.0",
        "Alice Jones"
    )

def register_code_signers():
    if not VARS.get("ENABLE_CODE_SIGNING"):
        return

register_code_signers()
register_target("exe", make_exe)
register_target("resources", make_embedded_resources, depends=["exe"], default_build_script=True)
register_target("install", make_install, depends=["exe"], default=True)
register_target("msi_installer", make_msi, depends=["exe"])
resolve_targets()

And the specific Windows environment that I'm building on is the AWS Microsoft Windows Server 2022 Base

jevansbio commented 1 year ago

Seems like the same issue I was having:

https://github.com/indygreg/PyOxidizer/issues/531

I never found a fix unfortunatley.

bengetch commented 1 year ago

Ah, damn. Did you have the issue with multiprocessing more generally, or is it limited to multiprocessing.Pool (or Pool.map(), Pool.starmap(), etc.)? I'm hoping to have some time today to try out some alternative workflows, like using Process objects directly instead of Pool, but I'd be interested to hear what you might have tried already.

jevansbio commented 1 year ago

I think I restricted myself to the default multiprocessing library at the time!

msundvick commented 1 year ago

Just ran into this. Could be wrong, but this line seems like the culprit? https://github.com/indygreg/PyOxidizer/blob/b78b0cb75f4317c45408bbc9a569c062c482c679/pyembed/src/interpreter.rs#L576

My theory is call1 takes positional arguments only, no kwargs. It calls to def spawn_main(pipe_handle, parent_pid=None, tracker_fd=None):, which then puts a dict into pipe_handle instead of splatting it, eventually ending up in _winapi.DuplicateHandle( source_process, handle, target_process, 0, inheritable, _winapi.DUPLICATE_SAME_ACCESS) as the handle argument, hence the error.

If this is the case, then maybe the fix would be to use spawn_module.getattr("spawn_main")?.call((), Some(kwargs))?;?