marcelotduarte / cx_Freeze

cx_Freeze creates standalone executables from Python scripts, with the same performance, is cross-platform and should work on any platform that Python itself works on.
https://marcelotduarte.github.io/cx_Freeze/
Other
1.33k stars 217 forks source link

cx_Freeze with torch.multiprocessing using wrong source in child processes #2376

Closed dmagee closed 4 months ago

dmagee commented 5 months ago

Prerequisite

Describe the bug On linux when I use cx_Freeze with a python script that uses torch.multiprocessing to create multiple threads (which essentially calls multiprocessing) the child processes seem to try to use the original python files (for the program) and original python environment (for python modules), not the ones in the build directory. The initial result of this is errors about the program source .py files not being found. Other errors can occur if the source is copied into the build folder.

To Reproduce Environment is linux, python 3.11, pytorch v2.2.2+cu121 [Note: This problem does not occur on windows]

Minimal source (Minimal.py):

import os
os.environ['KERAS_BACKEND']="torch"

import torch

def per_device_launch_fn(current_gpu_index, num_gpu):

    for i in range(1,1000):
        print("Train...")

num_gpu =4

if __name__ == "__main__":

    print("Starting multiprocessing:"+str(num_gpu))
    torch.multiprocessing.start_processes(
                    per_device_launch_fn,
                    args=(num_gpu,),
                    nprocs=num_gpu,
                    join=True,
                    start_method="spawn",
            )

build script is

import sys
from cx_Freeze import setup, Executable

import sys
sys.setrecursionlimit(5000)

import os
os.environ['KERAS_BACKEND']="torch"

build_exe_options = {"packages": ["onnx","numpy","torch","PIL", "torchvision","keras","sympy","integr
als","multiprocessing"]}

setup(name="Mimimal",version="1.0",description="Minimal",options={"build_exe": build_exe_options},exe
cutables=[Executable("Minimal.py")])

Expected behavior I would expect the pyc versions of code in the build folder to be used under all circumstances (even by child processes), not the original ones.

Desktop (please complete the following information):

dmagee commented 4 months ago

Re-opening as my fix I previously posted doesn't actually work (unnless source is present in launch folder).

marcelotduarte commented 4 months ago

On linux when I use cx_Freeze with a python script that uses torch.multiprocessing to create multiple threads (which essentially calls multiprocessing) the child processes seem to try to use the original python files (for the program) and original python environment (for python modules), not the ones in the build directory. The initial result of this is errors about the program source .py files not being found. Other errors can occur if the source is copied into the build folder.

This information is for debug. This can be changed with replace_paths.

The real bug however must be the use of multiprocessing. Using stdlib's multiprocessing, we need to use freeze_support, but torch.multiprocessing should not have this function and so a way around this must be analyzed.

dmagee commented 4 months ago

I tried freeze_support(), which works/is needed on windows, but not linux. I'm not sure what paths I would replace. It appears to look for the python files of the app in the folder that the executable is run from, and throws an error that itcan't find them.

e.g. running from the build folder....

FileNotFoundError: [Errno 2] No such file or directory: '/some/folder/build/exe.linux-x86_64-3.11/Minimal.py' (repeated once per child process)

If you run it from another location it complains they are not in that location (always with the full path of that location).

D.

marcelotduarte commented 4 months ago

Can you test with cx_Freeze 7.0 and with dev release?

You can test with the latest development build: pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze For conda-forge the command is: conda install -y --no-channel-priority -S -c https://marcelotduarte.github.io/packages/conda cx_Freeze

dmagee commented 4 months ago

There's still an issue:

Still issue with finding the source:

$ ./Minimal
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
Starting multiprocessing:4
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.dat
--MAXPATHLEN: 4096
--filename: /my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/lib/library.zip
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    exec(code, module_main.__dict__)
  File "Minimal.py", line 4, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/__init__.py", line 49, in <module>
    exec(code, module_main.__dict__)
  File "Minimal.py", line 4, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/__init__.py", line 49, in <module>
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    exec(code, module_main.__dict__)
  File "Minimal.py", line 4, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/__init__.py", line 49, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    exec(code, module_main.__dict__)
  File "Minimal.py", line 4, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/__init__.py", line 49, in <module>
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support
    spawn_main(**kwds)
    spawn_main(**kwds)
    spawn_main(**kwds)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    spawn_main(**kwds)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
    exitcode = _main(fd, parent_sentinel)
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
    exitcode = _main(fd, parent_sentinel)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
    prepare(preparation_data)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
    prepare(preparation_data)
    prepare(preparation_data)
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    _fixup_main_from_path(data['init_main_from_path'])
    _fixup_main_from_path(data['init_main_from_path'])
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    _fixup_main_from_path(data['init_main_from_path'])
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
    main_content = runpy.run_path(main_path,
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 290, in run_path
  File "<frozen runpy>", line 290, in run_path
  File "<frozen runpy>", line 254, in _get_code_from_file
  File "<frozen runpy>", line 254, in _get_code_from_file
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
  File "<frozen runpy>", line 290, in run_path
  File "<frozen runpy>", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 290, in run_path
  File "<frozen runpy>", line 254, in _get_code_from_file
FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py'
Traceback (most recent call last):
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/__startup__.py", line 141, in run
    module_init.run(name + "__main__")
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run
    exec(code, module_main.__dict__)
  File "Minimal.py", line 28, in <module>
  File "Minimal.py", line 18, in main
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
              ^^^^^^^^^^^^^^
  File "/my/home/folder/.conda/envs/pt_p311/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 148, in join
    raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 255
marcelotduarte commented 4 months ago

From what I understand you are using conda for Linux. What command did you use to install this specific version of Torch?

dmagee commented 4 months ago

Actually I set up the environment with conda, but used pip to install the modules as I couldn't get the versions I needed with conda. I think the command was:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 (from https://pytorch.org/get-started/locally/)

marcelotduarte commented 4 months ago

Using your Minimal.py and command line: cxfreeze --script Minimal.py build_exe --replace-paths '*=' The patch works with Linux pip: pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu

Will be available in cx_Freeze 7.1.0.dev16

marcelotduarte commented 4 months ago

https://cx-freeze--2382.org.readthedocs.build/en/2382/faq.html#multiprocessing-support

You can test the patch in the latest development build: pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze For conda-forge the command is: conda install -y --no-channel-priority -S -c https://marcelotduarte.github.io/packages/conda cx_Freeze

dmagee commented 4 months ago

So I did:

cxfreeze --script Minimal.py build_exe --replace-paths '*='

And, I now get the error:

FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py'

marcelotduarte commented 4 months ago

Please check if you have cx_Freeze 7.1.0.dev16 with: cxfreeze --version

dmagee commented 4 months ago

Actually it was cxfreeze 7.1.0-dev15. Not sure how that happened, as I followed your instructions. I just tried it again and now I have 7.1.0-dev16. However, same output:

FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py'

marcelotduarte commented 4 months ago

Uninstall cx_Freeze and reinstall. Are you using the pip or conda version? Probably some conflict.

dmagee commented 4 months ago

I was using PIP. I uninstalled and re-installed via pip, and same error. I then tried uninstalling via pip, and installing via conda, and I get:

$ conda install -y --no-channel-priority -S -c https://marcelotduarte.github.io/packages/conda cx_Freeze Retrieving notices: ...working... done Collecting package metadata (current_repodata.json): failed

UnavailableInvalidChannel: HTTP 404 NOT FOUND for channel packages/conda https://marcelotduarte.github.io/packages/conda

The channel is not accessible or is invalid.

You will need to adjust your conda configuration to proceed. Use conda config --show channels to view your configuration's current state, and use conda config --show-sources to view config file locations.

marcelotduarte commented 4 months ago

Initially, I did two tests. If you can do the same, to eliminate any bugs. I created a new environment using the system python and another using Conda. But if you test this second option the way I tested it, it's already good. Then, I installed cx_Freeze and PyTorch:

pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu

Note that I used the cpu version, use that too. Then I will test using Cuda.

dmagee commented 4 months ago

In the meantime I re-installed using conda by doing:

wget https://marcelotduarte.github.io/packages/conda/linux-64/cx_freeze-7.1.0.dev16-py311h459d7ec_0.conda conda install cx_freeze-7.1.0.dev16-py311h459d7ec_0.conda (after uninstalling using pip)

I also got the same error.

Note: The whole point of torch.multiprocessing is to use multiple GPUs, so it working just on CPU isn't that useful.

I'll try to create an entirely new environment from scratch with conda and see if it works...

dmagee commented 4 months ago

I created an entirely new environment with just cx_Freeze and torch (GPU version) with the same issue, this is my history;

1050 conda create --name cxtest python=3.11 1051 conda activate cxtest 1052 pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze 1053 pip3 install torch torchvision torchaudio 1054 cd ../.. 1055 python Minimal.py ---- Note: This works fine 1056 rm -r build 1057 cxfreeze --script Minimal.py build_exe --replace-paths '*=' 1058 cd build/exe.linux-x86_64-3.11/ 1059 ./Minimal

Output (Note: Ever so slightly different from before as getting SIGTERM that I didn't before, same file missing error though):

$ ./Minimal Starting multiprocessing:4 Starting multiprocessing:4 Traceback (most recent call last): File "=/startup.py", line 141, in run File "=/console.py", line 19, in run File "=/Minimal.py", line 28, in File "=/Minimal.py", line 18, in main File "=/torch/multiprocessing/spawn.py", line 208, in start_processes File "=/multiprocessing/context.py", line 243, in get_context File "=/multiprocessing/init.py", line 56, in File "=/multiprocessing/init.py", line 53, in _get_freeze_context File "=/multiprocessing/spawn.py", line 79, in freeze_support File "=/multiprocessing/spawn.py", line 122, in spawn_main File "=/multiprocessing/spawn.py", line 131, in _main File "=/multiprocessing/spawn.py", line 246, in prepare File "=/multiprocessing/spawn.py", line 297, in _fixup_main_from_path File "", line 290, in run_path File "", line 254, in _get_code_from_file FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py' Starting multiprocessing:4 Traceback (most recent call last): File "=/startup.py", line 141, in run File "=/console.py", line 19, in run File "=/Minimal.py", line 28, in File "=/Minimal.py", line 18, in main File "=/torch/multiprocessing/spawn.py", line 208, in start_processes File "=/multiprocessing/context.py", line 243, in get_context File "=/multiprocessing/init.py", line 56, in File "=/multiprocessing/init.py", line 53, in _get_freeze_context File "=/multiprocessing/spawn.py", line 79, in freeze_support File "=/multiprocessing/spawn.py", line 122, in spawn_main File "=/multiprocessing/spawn.py", line 131, in _main File "=/multiprocessing/spawn.py", line 246, in prepare File "=/multiprocessing/spawn.py", line 297, in _fixup_main_from_path File "", line 290, in run_path File "", line 254, in _get_code_from_file FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py' Starting multiprocessing:4 Traceback (most recent call last): File "=/startup.py", line 141, in run File "=/console.py", line 19, in run File "=/Minimal.py", line 28, in File "=/Minimal.py", line 18, in main File "=/torch/multiprocessing/spawn.py", line 208, in start_processes File "=/multiprocessing/context.py", line 243, in get_context File "=/multiprocessing/init.py", line 56, in File "=/multiprocessing/init.py", line 53, in _get_freeze_context File "=/multiprocessing/spawn.py", line 79, in freeze_support File "=/multiprocessing/spawn.py", line 122, in spawn_main File "=/multiprocessing/spawn.py", line 131, in _main File "=/multiprocessing/spawn.py", line 246, in prepare File "=/multiprocessing/spawn.py", line 297, in _fixup_main_from_path File "", line 290, in run_path File "", line 254, in _get_code_from_file FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py' Starting multiprocessing:4 Traceback (most recent call last): File "=/startup.py", line 141, in run File "=/console.py", line 19, in run File "=/Minimal.py", line 28, in File "=/Minimal.py", line 18, in main File "=/torch/multiprocessing/spawn.py", line 208, in start_processes File "=/multiprocessing/context.py", line 243, in get_context File "=/multiprocessing/init.py", line 56, in File "=/multiprocessing/init.py", line 53, in _get_freeze_context File "=/multiprocessing/spawn.py", line 79, in freeze_support File "=/multiprocessing/spawn.py", line 122, in spawn_main File "=/multiprocessing/spawn.py", line 131, in _main File "=/multiprocessing/spawn.py", line 246, in prepare File "=/multiprocessing/spawn.py", line 297, in _fixup_main_from_path File "", line 290, in run_path File "", line 254, in _get_code_from_file FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/=/Minimal.py' W0507 18:55:52.264000 140603076392768 ../=/torch/multiprocessing/spawn.py:145] Terminating process 2386939 via signal SIGTERM W0507 18:55:52.264000 140603076392768 ../=/torch/multiprocessing/spawn.py:145] Terminating process 2386941 via signal SIGTERM Traceback (most recent call last): File "=/startup.py", line 141, in run File "=/console.py", line 19, in run File "=/Minimal.py", line 28, in File "=/Minimal.py", line 18, in main File "=/torch/multiprocessing/spawn.py", line 237, in start_processes File "=/torch/multiprocessing/spawn.py", line 177, in join torch.multiprocessing.spawn.ProcessExitedException: process 3 terminated with exit code 255

marcelotduarte commented 4 months ago

In the meantime I re-installed using conda by doing:

The conda version has a bug, I'll try to solve it.

cxfreeze --script Minimal.py build_exe --replace-paths '*=' ... Output (Note: Ever so slightly different from before as getting SIGTERM that I didn't before, same file missing error though):

I had told you to use replace_paths exactly to remove the complete path information in the traceback, but I see that it now causes the (previous) error or the SIGTERM. I'll investigate it.

But, using only: cxfreeze --script Minimal.py build_exe or with other parameters, like: cxfreeze --script Minimal.py build_exe --silent It worked... It should work with your original or modified setup without 'packages' too.

dmagee commented 4 months ago

Using cxfreeze --script Minimal.py build_exe I get a slightly different error with my new environment (cx_Freeze installed with pip as history above):

(cxtest) $ ./Minimal Starting multiprocessing:4 Starting multiprocessing:4 Traceback (most recent call last): File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run module_init.run(name + "main") File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run exec(code, module_main.dict) File "Minimal.py", line 28, in File "Minimal.py", line 18, in main File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 208, in start_processes Starting multiprocessing:4 Traceback (most recent call last): File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run module_init.run(name + "main") File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run exec(code, module_main.dict) File "Minimal.py", line 28, in Starting multiprocessing:4 File "Minimal.py", line 18, in main File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 208, in start_processes Traceback (most recent call last): File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run module_init.run(name + "main") mp = multiprocessing.get_context(start_method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/context.py", line 243, in get_context File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run mp = multiprocessing.get_context(start_method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/context.py", line 243, in get_context exec(code, module_main.dict) File "Minimal.py", line 28, in File "Minimal.py", line 18, in main File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 208, in start_processes mp = multiprocessing.get_context(start_method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/context.py", line 243, in get_context Starting multiprocessing:4 Traceback (most recent call last): File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run module_init.run(name + "main") File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run exec(code, module_main.dict) File "Minimal.py", line 28, in File "Minimal.py", line 18, in main File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 208, in start_processes return super().get_context(method) return super().get_context(method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 56, in ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 56, in return super().get_context(method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 56, in mp = multiprocessing.get_context(start_method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/context.py", line 243, in get_context return super().get_context(method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 56, in File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 53, in _get_freeze_context File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 53, in _get_freeze_context File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 53, in _get_freeze_context File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/init.py", line 53, in _get_freeze_context File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 79, in freeze_support spawn_main(kwds) spawn_main(kwds) spawn_main(kwds) File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main spawn_main(kwds) File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main exitcode = _main(fd, parent_sentinel) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 131, in _main exitcode = _main(fd, parent_sentinel) exitcode = _main(fd, parent_sentinel) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 131, in _main ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 131, in _main exitcode = _main(fd, parent_sentinel) prepare(preparation_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 131, in _main prepare(preparation_data) File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare prepare(preparation_data) prepare(preparation_data) File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path _fixup_main_from_path(data['init_main_from_path']) File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path _fixup_main_from_path(data['init_main_from_path']) File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path _fixup_main_from_path(data['init_main_from_path']) File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path main_content = runpy.run_path(main_path, main_content = runpy.run_path(main_path, main_content = runpy.run_path(main_path, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 290, in run_path ^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 254, in _get_code_from_file File "", line 290, in run_path File "", line 290, in run_path File "", line 254, in _get_code_from_file FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py' File "", line 254, in _get_code_from_file FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py' FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py' main_content = runpy.run_path(main_path, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 290, in run_path File "", line 254, in _get_code_from_file FileNotFoundError: [Errno 2] No such file or directory: '/my/home/folder/minimal_bug/build/exe.linux-x86_64-3.11/Minimal.py' W0508 09:27:04.490000 140290065565504 ../../../../.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:145] Terminating process 1901879 via signal SIGTERM W0508 09:27:04.491000 140290065565504 ../../../../.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:145] Terminating process 1901880 via signal SIGTERM W0508 09:27:04.491000 140290065565504 ../../../../.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:145] Terminating process 1901881 via signal SIGTERM Traceback (most recent call last): File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/startup.py", line 141, in run module_init.run(name + "main") File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/cx_Freeze/initscripts/console.py", line 19, in run exec(code, module_main.dict) File "Minimal.py", line 28, in File "Minimal.py", line 18, in main File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 237, in start_processes while not context.join(): ^^^^^^^^^^^^^^ File "/my/home/folder/.conda/envs/cxtest/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 177, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 3 terminated with exit code 255

marcelotduarte commented 4 months ago
(cxtest) marcelo@teste7:/mnt/81da54df-d490-4cc4-a259-ffbee7f55c92/testes/2376$ python -VV
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
(cxtest) marcelo@teste7:/mnt/81da54df-d490-4cc4-a259-ffbee7f55c92/testes/2376$ pip list
Package                  Version
------------------------ -----------
cx_Freeze                7.1.0.dev16
filelock                 3.14.0
fsspec                   2024.3.1
Jinja2                   3.1.4
MarkupSafe               2.1.5
mpmath                   1.3.0
networkx                 3.3
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.1.105
patchelf                 0.17.2.1
pillow                   10.3.0
pip                      24.0
setuptools               69.5.1
sympy                    1.12
torch                    2.3.0
torchaudio               2.3.0
torchvision              0.18.0
triton                   2.3.0
typing_extensions        4.11.0
wheel                    0.43.0
(cxtest) marcelo@teste7:/mnt/81da54df-d490-4cc4-a259-ffbee7f55c92/testes/2376$ 
dmagee commented 4 months ago

I can't see a difference!

(cxtest) ]$ python -VV Python 3.11.9 packaged by conda-forge (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] (cxtest) $ pip list Package Version

cx_Freeze 7.1.0.dev16 filelock 3.14.0 fsspec 2024.3.1 Jinja2 3.1.4 MarkupSafe 2.1.5 mpmath 1.3.0 networkx 3.3 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.1.105 patchelf 0.17.2.1 pillow 10.3.0 pip 24.0 setuptools 69.5.1 sympy 1.12 torch 2.3.0 torchaudio 2.3.0 torchvision 0.18.0 triton 2.3.0 typing_extensions 4.11.0 wheel 0.43.0

dmagee commented 4 months ago

Are you sure you don't have the source files in the folder you are running the executable from? It's the only thing I can think of.

marcelotduarte commented 4 months ago

Now, I understand the situation. You can test the fix in the latest development build (cx_Freeze 7.1.0.dev18): pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze

dmagee commented 4 months ago

This version just hangs. You run the program and it outputs absolutely nothing to the screen, and doesn't return.

EDIT: If you leave it long enough, it does actually run ok. I'm just timing it now to see how log, but it was more than a few miniutes.

History: 1002 conda activate cxtest 1003 pip uninstall cx_Freeze 1004 pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze 1005 cxfreeze --script Minimal.py build_exe 1006 cd build/exe.linux-x86_64-3.11/ 1007 ./Minimal

marcelotduarte commented 4 months ago

I changed the code a bit to check __file__:

import torch

def per_device_launch_fn(current_gpu_index, num_gpu):

    for i in range(1, 1000):
        print("Train...")

num_gpu = 4

if __name__ == "__main__":
    print("Starting multiprocessing:", num_gpu, __file__)
    torch.multiprocessing.start_processes(
                    per_device_launch_fn,
                    args=(num_gpu,),
                    nprocs=num_gpu,
                    join=True,
                    start_method="spawn",
            )

$ time python Minimal.py

real    0m2,951s
user    0m6,175s
sys 0m2,158s

$ cxfreeze --script Minimal.py build_exe $ (cd build/exe.linux-x86_64-3.11/ && time ./Minimal)

real    0m8,011s
user    0m6,434s
sys 0m2,768s

In the next run, the time is similar to the time used by the python command:

real    0m2,645s
user    0m5,945s
sys 0m2,318s

And next time too:

real    0m2,875s
user    0m5,661s
sys 0m2,496s

But, using to build: $ cxfreeze --script Minimal.py build_exe --silent --no-compress --zip-filename= Running time is a little shorter on the first run:

real    0m5,146s
user    0m6,148s
sys 0m2,584s
dmagee commented 4 months ago

I think the timing thing may have been system related (it's a shared computer) one run took 1.5hours last week, but today it's not taking that long. One other issue I did notice though is that every sub-process in the frozen version the following is true:

if name == "main":

resulting in this code being called N times, whereas in the python version it's only called once. This doesn't matter in the minimal example (the training loop is called 4 times with different values of current_gpu_index), but for my real program the logic is a bit more complex in main() as it checks sys.argv in the main process, which results in different behaviour in python and frozen version. I maybe able to re-write the code to get round this, but it does strike me as a bug, as presumably torch.multiprocessing must be doing something to ensure per_device_launch_fn() is called in the python version, whereas in the frozen version it is being called via main(). I'm doing some testing to see if this is significant.

Edit: Child processed seem to be called with the following (additional*?) arguments:

--multiprocessing-fork tracker_fd=XX pipe_handle=YY

Where XX is the same for all children, and YY is different for each child. I'm assuming in the python version the torch.multiprocessing code reads these and puts sys.argv back how you might expect.

[* My program has no arguments, so it's not clear if they are additional, or replacements]

marcelotduarte commented 4 months ago

The hook that I used to patch multiprocessing is based on #264 and later I discovered a patch similar (#https://github.com/marcelotduarte/cx_Freeze/issues/501#issuecomment-1629733246), even open https://github.com/python/cpython/pull/104607. So I don't see much to do. I thought of a possibility, I did some tests and I didn't see a difference, of course, using that test you gave me, not something big.

... but for my real program the logic is a bit more complex in main() as it checks sys.argv in the main process, which results in different behaviour in python and frozen version ...

I don't think it's very different, see how the spawn is described.

dmagee commented 4 months ago

Update: Simply doing this works round this:

if name == "main":
    no_args = len(sys.argv)
    if no_args>1 and sys.argv[1]=="--multiprocessing-fork":
            print("Is fork-child")
            torch.multiprocessing.start_processes(
                                per_device_launch_fn,
                                args=(num_gpu,),
                                nprocs=num_gpu,
                                join=True,
                                start_method="spawn",
                        )
    else:
    # Normal initialisation for parent process

To be clear, this is not necessary in the python version.

marcelotduarte commented 4 months ago

Release 7.1.0 is out! Documentation

I'll continue to work on pytorch hook to optimize it.

marcelotduarte commented 3 months ago

Based on information from you and others, I improved the hook for multiprocessing. You can test the patch in the latest development build: pip install --force --no-cache --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze The provisional documentation: https://cx-freeze--2443.org.readthedocs.build/en/2443/faq.html#multiprocessing-support The updated script is:

import torch
from multiprocessing import freeze_support

def per_device_launch_fn(current_gpu_index, num_gpu):

    for i in range(1, 1000):
        print("Train...")

num_gpu = 4

if __name__ == "__main__":
    freeze_support()
    print("Starting multiprocessing:", num_gpu, __file__)
    torch.multiprocessing.start_processes(
                    per_device_launch_fn,
                    args=(num_gpu,),
                    nprocs=num_gpu,
                    join=True,
                    start_method="spawn",
            )
marcelotduarte commented 3 months ago

Release 7.1.1 is out! Documentation