linnaeushuang / pensieve-pytorch

MIT License
27 stars 10 forks source link

Issue when training the model - multiprocessing #13

Open Chidu2000 opened 6 months ago

Chidu2000 commented 6 months ago

I am running the model for some 'x' no. of epochs, x > 1, but, after all the execution is completed, I am getting an error as such,

Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/ubuntu/Downloads/Pensieve-DRL-Master-thesis/pensieve-pytorch/pensieve_torch.py", line 140, in central_agent s_batch, a_batch, r_batch, terminal, info = exp_queues[i].get() #from experience replay buffer (pushed to it by the local agents from which coordinator downloads experiences) File "/usr/lib/python3.10/multiprocessing/queues.py", line 122, in get return _ForkingPickler.loads(res) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 495, in rebuild_storage_fd fd = df.detach() File "/usr/lib/python3.10/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/usr/lib/python3.10/multiprocessing/resource_sharer.py", line 86, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/usr/lib/python3.10/multiprocessing/connection.py", line 502, in Client c = SocketClient(address) File "/usr/lib/python3.10/multiprocessing/connection.py", line 630, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory 0:00:15.400819

I have even tried using mp.Event() to ensure the events are happening smoothly between local and central agent, but this issue still persists. Please help me out in this regard.