minerllabs / minerl

MineRL Competition for Sample Efficient Reinforcement Learning - Python Package
http://minerl.io/docs/
Other
673 stars 153 forks source link

Multiple instances of minerl running in same execution environment causes shutdowns #177

Open jon-chuang opened 5 years ago

jon-chuang commented 5 years ago

I have experienced multiple shutdowns of minerl when 1. running multiple instances (e.g. 5) in parallel, 2. interrupting a jupyter kernel communicating with minerl.

The first is more serious and the second is just additional info.

I believe this is a bug.

The range of error messages I get are as follows: ~/.local/lib/python3.7/site-packages/gym/wrappers/time_limit.py in step(self, action) 13 def step(self, action): 14 assert self._elapsed_steps is not None, "Cannot call env.step() before calling reset()" ---> 15 observation, reward, done, info = self.env.step(action) 16 self._elapsed_steps += 1 17 if self._elapsed_steps >= self._max_episode_steps: ~/miniconda/envs/py37/lib/python3.7/site-packages/minerl/env/core.py in step(self, action) 525 # Receive reward done and sent. 526 reply = comms.recv_message(self.client_socket) --> 527 reward, done, sent = struct.unpack('!dbb', reply) 528 529 # Receive info from the environment. TypeError: a bytes-like object is required, not 'NoneType'

Failed to reset (socket error), trying again! Cleaning connection! Something must have gone wrong. Failed to reset (socket error), trying again! Cleaning connection! Something must have gone wrong. Connection with Minecraft client cleaned more than once; restarting.

This can require the restarting of either the gym make of minecraft, which is not a big issue, or my jupyter kernel, which is extremely disruptive to my running experiments.

There are other error messages which I will add to this issue once I encounter them again.

MadcowD commented 5 years ago

Can you attach your ./logs?

jon-chuang notifications@github.com schrieb am Do. 25. Juli 2019 um 17:43:

I have experienced multiple shutdowns of minerl when 1. running multiple instances (e.g. 5) in parallel, 2. interrupting a jupyter kernel communicating with minerl.

The first is more serious and the second is just info.

The range of error messages I get are as follows: `~/.local/lib/python3.7/site-packages/gym/wrappers/time_limit.py in step(self, action) 13 def step(self, action): 14 assert self._elapsed_steps is not None, "Cannot call env.step() before calling reset()" ---> 15 observation, reward, done, info = self.env.step(action) 16 self._elapsed_steps += 1 17 if self._elapsed_steps >= self._max_episode_steps:

~/miniconda/envs/py37/lib/python3.7/site-packages/minerl/env/core.py in step(self, action) 525 # Receive reward done and sent. 526 reply = comms.recv_message(self.client_socket) --> 527 reward, done, sent = struct.unpack('!dbb', reply) 528 529 # Receive info from the environment.

TypeError: a bytes-like object is required, not 'NoneType'`

Failed to reset (socket error), trying again! Cleaning connection! Something must have gone wrong. Failed to reset (socket error), trying again! Cleaning connection! Something must have gone wrong. Connection with Minecraft client cleaned more than once; restarting.

This can require the restarting of my jupyter kernel, which is extremely disruptive to running experiments.

There are other error messages which I will add to this issue once I encounter them again.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/minerllabs/minerl/issues/177?email_source=notifications&email_token=AAFPVL22FT3LXJ37YG4U77TQBIM7FA5CNFSM4IG6VKP2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HBSVUGA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFPVL4XNBDFX5SAI7JDFPTQBIM7FANCNFSM4IG6VKPQ .

-- William Guss

T +1-801-891-0781 E william@wguss.com http://www.wguss.com

jon-chuang commented 5 years ago

mc_6.log mc_7.log

jon-chuang commented 5 years ago

I'm not sure which logs correspond to my errors. When I get the same issue, I will attach my logs.

MadcowD commented 5 years ago

Sweet! I have seen this error before :)))) Im on it !

jon-chuang commented 5 years ago

Great thanks!

jon-chuang commented 5 years ago

I've noticed this error only occurs when I try starting multiple instances of minerl at the same time; consquently a quick fix is just to stagger, annoying but I haven't encountered errors since.

brandonhoughton commented 5 years ago

I have been able to reproduce this: https://gist.github.com/brandonhoughton/69c2a85043471c0043f9c9a003d9bf91

NotNANtoN commented 5 years ago

I have exactly the same issue when trying to train in 4 separate processes on a machine with 4 GPUs. The error messages are all over each other: uesr@basegpu1:/home/user/Deep-RL-Torch$ ERROR:minerl.env.malmo.instance.54e7b2:[0 4:07:54] [EnvServerSocketHandler/INFO]: [STDOUT]: [ERROR] Video observation is null; please notify the developer. ERROR:minerl.env.malmo.instance.54e7b2:[04:07:54] [EnvServerSocketHandler/INFO]: [STD ERR]: java.lang.NullPointerException ERROR:minerl.env.malmo.instance.54e7b2:Exception in thread "EnvServerSocketHandler" [ 04:07:54] [EnvServerSocketHandler/INFO]: [STDERR]: at com.microsoft.Malmo.Client .MalmoEnvServer.stepSync(MalmoEnvServer.java:507) ERROR:minerl.env.malmo.instance.54e7b2:[04:07:54] [EnvServerSocketHandler/INFO]: [STD ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.step(MalmoEnvServer.java:534) ERROR:minerl.env.malmo.instance.54e7b2:[04:07:54] [EnvServerSocketHandler/INFO]: [STD ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.access$400(MalmoEnvServer.java:5 1) ERROR:minerl.env.malmo.instance.54e7b2:[04:07:54] [EnvServerSocketHandler/INFO]: [STD ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer$1.run(MalmoEnvServer.java:154) Traceback (most recent call last): File "train.py", line 227, in <module> trainer.run(600000, render=False, verbose=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 174, in run self.fill_replay_buffer(n_actions=self.n_initial_random_actions) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 103, in fill_replay_buffer explore=True, fully_random=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 120, in _act next_state, reward, done, _ = env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/core .py", line 285, in step return self.env.step(self.action(action)) File "/informatik2/students/home/8wiehe/.local/lib/python3.6/site-packages/gym/core .py", line 261, in step observation, reward, done, info = self.env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/wrap pers/time_limit.py", line 16, in step observation, reward, done, info = self.env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/minerl/$ nv/core.py", line 536, in step reward, done, sent = struct.unpack('!dbb', reply) TypeError: a bytes-like object is required, not 'NoneType' ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Connection with Minecraft client cleaned more than once; resta$ ting. ERROR:minerl.env.malmo:Attempted to send kill command to minecraft process and faile$ . ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ OUT]: [ERROR] Video observation is null; please notify the developer. ERROR:minerl.env.malmo.instance.af6e12:Exception in thread "EnvServerSocketHandler" $ 04:09:26] [EnvServerSocketHandler/INFO]: [STDERR]: java.lang.NullPointerException ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.stepSync(MalmoEnvServer.java:50$ ) ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.step(MalmoEnvServer.java:534) ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer.access$400(MalmoEnvServer.java:$ 1) ERROR:minerl.env.malmo.instance.af6e12:[04:09:26] [EnvServerSocketHandler/INFO]: [ST$ ERR]: at com.microsoft.Malmo.Client.MalmoEnvServer$1.run(MalmoEnvServer.java:154) Traceback (most recent call last): trainer.run(600000, render=False, verbose=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 174, in run self.fill_replay_buffer(n_actions=self.n_initial_random_actions) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 103, in fill_replay_buffer explore=True, fully_random=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 120, in _act next_state, reward, done, _ = env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/cor$ .py", line 285, in step return self.env.step(self.action(action)) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/cor$ .py", line 261, in step observation, reward, done, info = self.env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/gym/wra$ pers/time_limit.py", line 16, in step observation, reward, done, info = self.env.step(action) File "/informatik2/students/home/user/.local/lib/python3.6/site-packages/minerl/$ nv/core.py", line 536, in step reward, done, sent = struct.unpack('!dbb', reply) TypeError: a bytes-like object is required, not 'NoneType' ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Connection with Minecraft client cleaned more than once; restar ting. File "train.py", line 227, in <module> trainer.run(600000, render=False, verbose=True) File "/srv/home/user/Deep-RL-Torch/trainer.py", line 222, in run self.policy.optimize() File "/srv/home/user/Deep-RL-Torch/policies.py", line 104, in optimize self.policy.optimize() File "/srv/home/user/Deep-RL-Torch/policies.py", line 313, in optimize transitions = self.get_transitions() File "/srv/home/user/Deep-RL-Torch/policies.py", line 347, in get_transitions importance_weights = torch.from_numpy(importance_weights).float() TypeError: can't convert np.ndarray of type numpy.object_. The only supported types $ re: float64, float32, float16, int64, int32, int16, int8, uint8, and bool. ERROR:minerl.env.malmo.instance.fb8a11:[08:26:06] [Client thread/INFO]: [STDOUT]: CL$ ENT request state: ERROR_TIMED_OUT_WAITING_FOR_EPISODE_PAUSE ERROR:minerl.env.malmo.instance.fb8a11:[08:26:06] [Client thread/INFO]: [STDOUT]: CL$ ENT enter state: ERROR_TIMED_OUT_WAITING_FOR_EPISODE_PAUSE ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Failed to reset (socket error), trying again! ERROR:minerl.env.core:Cleaning connection! Something must have gone wrong. ERROR:minerl.env.core:Connection with Minecraft client cleaned more than once; resta$ ting.

shwang commented 4 years ago

We ended up fixing this here https://github.com/HumanCompatibleAI/minerl/pull/5 which led to another parallelization error addressed by https://github.com/HumanCompatibleAI/minerl/pull/6 .

Be happy to merge this in the future if the maintainers are interested (though I'm a bit busy right now, so probably in a week or two)