python / cpython

The Python programming language
https://www.python.org
Other
62.17k stars 29.88k forks source link

multiprocessing.Process generates FileNotFoundError when argument isn't explicitly referenced #94765

Open JZerf opened 2 years ago

JZerf commented 2 years ago

Bug report This is a continuation for the possible bug mentioned in issue https://github.com/python/cpython/issues/82236 which was closed because DonnyBrown, the submitter, didn't provide enough information.

DonnyBrown was getting a FileNotFoundError when starting a process with multiprocessing.Process that uses an argument that doesn't have an explicit reference. I'm able to reproduce the same error using the test code DonnyBrown provided in that issue on Ubuntu Desktop LTS 22.04 x86-64 with CPython 3.10.4. @iritkatriel mentioned that they were unable to reproduce the error on Windows 10 with Python 3.10.

I can also reproduce the error using this slightly modified/simpler version of DonnyBrown's test code that I have been testing:

import multiprocessing

def demo(argument):
    print(argument)

if __name__=="__main__":
    multiprocessing.set_start_method("spawn") # Changing this to "fork" (on platforms where it is
                                              # available) can also cause the below code to work.

    process=multiprocessing.Process(target=demo, args=[multiprocessing.Value("i", 0)]) # FAILS

    #process=multiprocessing.Process(target=demo, args=[0])                            # WORKS

    #reference_To_Number=multiprocessing.Value("i", 0)                                 # WORKS
    #process=multiprocessing.Process(target=demo, args=[reference_To_Number])

    process.start()
    process.join()

The traceback I get with the above code is:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

The above code can be made to work on my test system by making any of the following changes:

I'm not a Python expert so maybe this is the expected behavior when spawning a process directly with a multiprocessing.Value but it does seem odd that making any of the above mentioned changes causes the code to work or that (based on @iritkatriel's success with DonnyBrown's test code) running it on Windows 10 (which uses the "spawn" start method) will probably cause the code to work.

Your environment

akulakov commented 2 years ago

Confirmed the issue on Python 3.9 and 3.12 on MacOS 11.5.2 .

curonny commented 1 year ago

Confirmed the issue on python 3.8 on macos 13.0.1

whitedemong commented 1 year ago

Confirmed the issue on Python 3.8.16 on MacOS 13.1 (22C65).

david-thrower commented 1 year ago

Same here on Ubuntu 22.04 with Python 3.10.6. File it is looking for has 777 permissions. Specifically [working_directory]/lib/tom-select/tom-select.css, a file created by pyvis v 0.3.1 by a past run of the same script. If I rm rf lib and run it again: I get the error: [Errno 39] Directory not empty: 'vis-9.0.4'. This was stable without a lock context previously (ran thousands of times without a problem). When I nest this in a function, it still works. When I call that function as the target of a Process, I get these errors.

These errors were also very opaque, as they required a chain of try: .., except Exception as err: ... print(err) clauses to get the error to print to the console. I presume there is an issue with the pipe of stderr in this context as well.

Additionally, running the process in the context of "fork" does not resolve the issue (same error). The workaround of using args = [multiprocessing.Value(...)] instead of args=(0) throws the error TypeError: this type has no size:

Traceback (most recent call last):
  File "/[redacted]/my-script.py", line 346, in <module>
    processes_list = [ctx.Process(target=objective,
  File "/[redacted]/my-script.py", line 347, in <listcomp>
    args=[Value("trial", 0)]
  File "/usr/lib/python3.10/multiprocessing/context.py", line 135, in Value
    return Value(typecode_or_type, *args, lock=lock,
  File "/usr/lib/python3.10/multiprocessing/sharedctypes.py", line 74, in Value
    obj = RawValue(typecode_or_type, *args)
  File "/usr/lib/python3.10/multiprocessing/sharedctypes.py", line 49, in RawValue
    obj = _new_value(type_)
  File "/usr/lib/python3.10/multiprocessing/sharedctypes.py", line 40, in _new_value
    size = ctypes.sizeof(type_)
TypeError: this type has no size

As a (not ideal) workaround, I ultimately made the offending process operate under a subprocess.run() context as a parameterized script and used a Process() as a proxy between the main script and the actual process. That "worked for now".

RinatV commented 1 year ago

https://superfastpython.com/filenotfounderror-multiprocessing-python/

Тут расписано

мне помог простой time.sleep(1) после p.start()

Luferov commented 1 year ago

Confirmed the issue on Python 3.9.17 on MacOS 14.

Starbuck5 commented 1 year ago

I was having a similar issue with sharing concurrency primitives (a multiprocessing.Queue in my case) across processes when using the spawn backend.

I believe this is happening because of ref counts / garbage collection. If there's a possibility the object gets deleted by the main process while/during/before being shared, the file isn't around when the other processes look for it and it is a FileNotFoundError. This explains why putting it a variable (preventing the object from being deallocated) works and explicitly putting into a progress argument does not.

The object getting deleted could also happen if the main process ends too soon, as referenced in this article: https://superfastpython.com/filenotfounderror-multiprocessing-python/

haimat commented 11 months ago

We can reproduce this problem with the following piece of code using Python 3.8.10 on Ubuntu Linux 20.04:

import multiprocessing as mp

def demo(argument):
    print(argument)

def create_process():
    arg = mp.Value("i", 0)
    return mp.Process(target=demo, args=[arg])

if __name__ == "__main__":
    mp.set_start_method("spawn")  # fails
    # mp.set_start_method("fork")  # works
    # mp.set_start_method("forkserver")  # also fails

    process = create_process()
    process.start()
    process.join()

This leads to the same stacktrace as in the OP. The issue does not seem to be related to the garbage collector, as disabling it before creating the process and enabling it after join() does also not help.

Is this a bug in CPython, or are we supposed to perform these steps in a different way?

Starbuck5 commented 11 months ago

This leads to the same stacktrace as in the OP. The issue does not seem to be related to the garbage collector, as disabling it before creating the process and enabling it after join() does also not help.

The object gets deallocated anyway because the refcount reaches 0. That's not part of the garbage collector I think.

After your create_process method finishes the value of arg gets deallocated. If you create arg in the name=main block and pass it to create_process I think the issue will be solved.

ziegenbalg commented 5 months ago

Hitting this problem as well.

Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
  File "/usr/lib64/python3.10/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib64/python3.10/multiprocessing/synchronize.py", line 87, in _cleanup
    sem_unlink(name)
FileNotFoundError: [Errno 2] No such file or directory
zuliani99 commented 5 months ago

In my conda environment I'm using python=3.12.2 however I get the warning that is referring to the multiprocessing module of python 3.8.

I've already double checked the python version using python --version and I'm using the latest version.

File "/opt/anaconda/anaconda3/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/opt/anaconda/anaconda3/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/opt/anaconda/anaconda3/lib/python3.8/multiprocessing/synchronize.py", line 87, in _cleanup
    sem_unlink(name)
FileNotFoundError: [Errno 2] No such file or directory

To be specific I'm using pytorch multiprocessing in order to spawn multiple process for multi-GPUs training. This the issue that I'm facing.

Chidu2000 commented 5 months ago

Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/ubuntu/Downloads/Pensieve-DRL-Master-thesis/pensieve-pytorch/hyp_param_test.py", line 114, in central_agent s_batch, a_batch, r_batch, terminal, info, net_env = exp_queues[i].get() # for all the 3 agents , so a vector of size 3 (i.e s,a,r_batch) File "/usr/lib/python3.10/multiprocessing/queues.py", line 122, in get return _ForkingPickler.loads(res) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 495, in rebuild_storage_fd fd = df.detach() File "/usr/lib/python3.10/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/usr/lib/python3.10/multiprocessing/resource_sharer.py", line 86, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/usr/lib/python3.10/multiprocessing/connection.py", line 502, in Client c = SocketClient(address) File "/usr/lib/python3.10/multiprocessing/connection.py", line 630, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory

Is this issue on all Python 3.x versions?

yeonfish6040 commented 5 months ago

Confirmed the issue on python 3.10 on macos 14.4.1

LSimon95 commented 3 months ago

https://superfastpython.com/filenotfounderror-multiprocessing-python/

Тут расписано

мне помог простой time.sleep(1) после p.start()

time.sleep work on Ubuntu 22.04.3 on python 3.10 but still a issue.