p-christ / Deep-Reinforcement-Learning-Algorithms-with-PyTorch

PyTorch implementations of deep reinforcement learning algorithms and environments
MIT License
5.56k stars 1.19k forks source link

FileNotFoundError: [Errno 2] No such file or directory #45

Open shuferhoo opened 4 years ago

shuferhoo commented 4 years ago

I run the Cart_Pole.py with A3C&A2C on linux and got the error.

Traceback (most recent call last): File "/usr/local/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/data/htx/git_study/agents/actor_critic_agents/A2C.py", line 19, in update_shared_model new_grads = gradient_updates_queue.get() File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) File "/home/htx/.env/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd fd = df.detach() File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory

elk-april commented 4 years ago

Hava you solved this problem? I also got it in another code.

MichaelXCChen commented 3 years ago

The reason for this seems to be explained in the multiprocessing documentation (https://docs.python.org/3.6/library/multiprocessing.html#pipes-and-queues), to quote: "Warning As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe. This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.

Note that a queue created using a manager does not have this issue. See Programming guidelines."

A potential solution is suggested here: https://stackoverflow.com/questions/45866698/multiprocessing-processes-wont-join

So putting all of this together, the problem is with this bit: https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/79fc69c1010f91795ca5319bc263f5c3442b0d25/agents/actor_critic_agents/A3C.py#L22-L27

Change line 26 to: gradient_updates_queue = multiprocessing.Manager().Queue()

Depending on the complexity and volume of the results queue, it might also be prudent to make the same change to line 25, as well.

This would seem to fix the problem for A3C. I haven't tried A2C, yet, though.