minerllabs / minerl

MineRL Competition for Sample Efficient Reinforcement Learning - Python Package
http://minerl.io/docs/
Other
672 stars 153 forks source link

Clients crash on launch with NullPointerError #700

Open zaptrem opened 1 year ago

zaptrem commented 1 year ago

Hello, I'm trying to use the DreamerV3 repo's Minecraft preset which relies on MineRL but the clients crash on launch. I'm using WSL and have confirmed Minecraft runs fine in the virtual environment and is correctly accelerated by OpenGL. I'm using OpenJDK8 as is required. Any idea what could be going wrong?

 Received Mission token e4da9a82-c004-46d8-940d-6943226fc538:0:0:1:true
DEBUG:minerl.env.malmo.instance.f66cde:[01:09:53] [EnvServerSocketHandler/INFO]: Received mission init command  <MissionInit xmlns="http://ProjectMalmo.microsoft.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" SchemaVersion="" PlatformVersion="0.37.0"><Mission><About><Summary>MineRLEnv-v1</Summary></About><ModSettings><MsPerTick>50</MsPerTick></ModSettings><ServerSection><ServerInitialConditions><Time><StartTime>0</StartTime><AllowPassageOfTime>true</AllowPassageOfTime></Time><AllowSpawning>true</AllowSpawning></ServerInitialConditions><ServerHandlers><DefaultWorldGenerator forceReset="true" generatorOptions="{}"/><ServerQuitWhenAnyAgentFinishes/></ServerHandlers></ServerSection><AgentSection mode="Survival"><Name>MineRLAgent0</Name><AgentStart><BreakSpeedMultiplier>100.0</BreakSpeedMultiplier></AgentStart><AgentHandlers><FileBasedPerformanceProducer/><PauseCommand/><VideoProducer want_depth="false"><Width>64</Width><Height>64</Height></VideoProducer><ObservationFromFullInventory flat="false"/><ObservationFromEquippedItem/><ObservationFromFullStats/><HumanLevelCommands/><CameraCommands/><PlaceCommands/><EquipCommands/><SimpleCraftCommands/><NearbyCraftCommands/><NearbySmeltCommands/></AgentHandlers></AgentSection></Mission><ExperimentUID>e4da9a82-c004-46d8-940d-6943226fc538</ExperimentUID><ClientRole>0</ClientRole><ClientAgentConnection><ClientIPAddress>127.0.0.1</ClientIPAddress><ClientMissionControlPort>0</ClientMissionControlPort><ClientCommandsPort>0</ClientCommandsPort><AgentIPAddress>127.0.0.1</AgentIPAddress><AgentMissionControlPort>0</AgentMissionControlPort><AgentVideoPort>0</AgentVideoPort><AgentDepthPort>0</AgentDepthPort><AgentLuminancePort>0</AgentLuminancePort><AgentObservationsPort>0</AgentObservationsPort><AgentRewardsPort>0</AgentRewardsPort><AgentColourMapPort>0</AgentColourMapPort></ClientAgentConnection></MissionInit>
ERROR:minerl.env.malmo.instance.ebd7c0:[01:09:53] [EnvServerSocketHandler/ERROR]: Error while processing commands
ERROR:minerl.env.malmo.instance.ebd7c0:java.lang.NullPointerException: null
DEBUG:minerl.env.malmo.instance.ebd7c0: at com.minerl.multiagent.env.EnvServer.setGameSetttings(EnvServer.java:338) ~[mcprec-6.13.jar:?]
DEBUG:minerl.env.malmo.instance.ebd7c0: at com.minerl.multiagent.env.EnvServer.initMission(EnvServer.java:260) ~[mcprec-6.13.jar:?]
DEBUG:minerl.env.malmo.instance.ebd7c0: at com.minerl.multiagent.env.EnvServer$1.run(EnvServer.java:131) [mcprec-6.13.jar:?]
Error inside process worker: Traceback (most recent call last):
  File "/home/zaptrem/dreamerv3/dreamerv3/embodied/core/worker.py", line 202, in _loop
    state, result = function(state, *args, **kwargs)
  File "/home/zaptrem/dreamerv3/dreamerv3/embodied/core/parallel.py", line 40, in _respond
    result = getattr(state, name)(*args, **kwargs)
  File "/home/zaptrem/dreamerv3/dreamerv3/embodied/core/wrappers.py", line 158, in step
    obs = self.env.step(action)
  File "/home/zaptrem/dreamerv3/dreamerv3/embodied/core/wrappers.py", line 117, in step
    return self.env.step({**action, self._key: index})
  File "/home/zaptrem/dreamerv3/dreamerv3/embodied/envs/minecraft.py", line 99, in step
    obs = self.env.step(action)
  File "/home/zaptrem/dreamerv3/dreamerv3/embodied/core/wrappers.py", line 25, in step
    return self.env.step(action)
  File "/home/zaptrem/dreamerv3/dreamerv3/embodied/envs/minecraft_base.py", line 93, in step
    obs = self._reset()
  File "/home/zaptrem/dreamerv3/dreamerv3/embodied/envs/minecraft_base.py", line 114, in _reset
    obs = self._env.step({'reset': True})
  File "/home/zaptrem/dreamerv3/dreamerv3/embodied/envs/from_gym.py", line 55, in step
    obs = self._env.reset()
  File "/home/zaptrem/dreamerv3/env/lib/python3.9/site-packages/minerl/env/_singleagent.py", line 22, in reset
    multi_obs = super().reset()
  File "/home/zaptrem/dreamerv3/env/lib/python3.9/site-packages/minerl/env/_multiagent.py", line 446, in reset
    self._send_mission(self.instances[0], agent_xmls[0], self._get_token(0, ep_uid))  # Master
  File "/home/zaptrem/dreamerv3/env/lib/python3.9/site-packages/minerl/env/_multiagent.py", line 606, in _send_mission
    ok, = struct.unpack("!I", reply)
TypeError: a bytes-like object is required, not 'NoneType'
Miffyli commented 1 year ago

As far as I am aware, Dreamer experiments relied on the older version of MineRL (v0.4.4). Make sure to install those.

Also, double-check your Java versions by checking java/javac --version; it should output something along lines of "1.8...". Sometimes a sneaky Java installation is used even if JDK 8 is installed.

If those do not help, paste the full log of error messages. A more informative error is usually higher up.

zaptrem commented 1 year ago

As far as I am aware, Dreamer experiments relied on the older version of MineRL (v0.4.4). Make sure to install those.

Also, double-check your Java versions by checking java/javac --version; it should output something along lines of "1.8...". Sometimes a sneaky Java installation is used even if JDK 8 is installed.

If those do not help, paste the full log of error messages. A more informative error is usually higher up.

Thanks for getting back so quickly! I switched to v0.4.4 and it seems to be working now. However, even with 2 envs on a system with 32GB of RAM and 12 cores the clients appear to be stepping pretty slowly (4/20 times per second). Also, shortly after completing the first episode/mission all of the clients crash or are killed (The behavior is the same with 8 clients, 4, and 2). Here's the log: https://cdn.discordapp.com/attachments/378227457750466574/1093581895536738384/message.txt

Miffyli commented 1 year ago

The "ticks per second" thing does not quite tell the step count, but yes, it is running bit slow. MineRL in general runs pretty slow, but that amount of warnings does sound bit suspicious. Games do want high core-clock CPUs, so that might explain the slow behaviour (or if machine is doing other CPU-heavy stuff).

As for the crashes, I recommend you try a bare Gym loop (create env, step through env with random actions), and see if it crashes. If not, something in the training code makes it act up (hard to say what, exactly. Could be lack of resets). However, the env tends to crash randomly at times, and I recommend wrapping all step/reset calls around some sort of safety check (if step/reset fails, recreate env).

zaptrem commented 1 year ago

The "ticks per second" thing does not quite tell the step count, but yes, it is running bit slow. MineRL in general runs pretty slow, but that amount of warnings does sound bit suspicious. Games do want high core-clock CPUs, so that might explain the slow behaviour (or if machine is doing other CPU-heavy stuff).

As for the crashes, I recommend you try a bare Gym loop (create env, step through env with random actions), and see if it crashes. If not, something in the training code makes it act up (hard to say what, exactly. Could be lack of resets). However, the env tends to crash randomly at times, and I recommend wrapping all step/reset calls around some sort of safety check (if step/reset fails, recreate env).

Hmm, I downloaded the repo as it was in 0.4.4 to run the test scripts and get this when running the multi-agent one even with a single agent, which says to report this issue to the maintainers:

https://pastebin.com/2QA6uMf3

Miffyli commented 1 year ago

MineRL has never had official support for multi-agent code even though there are some remnants on some implementation for it in the code-base, so those features are really finnicky if they even exist. The error message about gradle is about reporting it to Malmö creators, but that repo is not maintained either.

In short: multi-agent support is not supported and we can not provide help with it.

zaptrem commented 1 year ago

MineRL has never had official support for multi-agent code even though there are some remnants on some implementation for it in the code-base, so those features are really finnicky if they even exist. The error message about gradle is about reporting it to Malmö creators, but that repo is not maintained either.

In short: multi-agent support is not supported and we can not provide help with it.

Thanks for letting me know, but I also ran the multi-agent test script with --single enabled (which makes it single agent) in order to get a quick gym action loop and still had the same issue.

Miffyli commented 1 year ago

The whole multi-agent code side does not work. Note that the environment code file is called "multiagent" but it still does not really support it. You should create the environments via Gym API as well and not directly via the environment classes, as the wrappers set some env stuff.

zaptrem commented 1 year ago

The whole multi-agent code side does not work. Note that the environment code file is called "multiagent" but it still does not really support it. You should create the environments via Gym API as well and not directly via the environment classes, as the wrappers set some env stuff.

I ran the following code using gym directly as suggested:

import minerl
import gym
env = gym.make('MineRLNavigateDense-v0')

obs = env.reset()

done = False
while not done:
    action = env.action_space.sample()
    obs, reward, done, info = env.step(
        action)

But got the same error: WARNING: Illegal reflective access by com.microsoft.Malmo.Launcher.GradleStart (file:/tmp/tmptzi4y445/Minecraft/build/libs/MalmoMod-0.37.0-fat.jar) to field java.lang.ClassLoader.sys_paths

Full Log: https://pastebin.com/dSDumbdz

Miffyli commented 1 year ago

Hmm, the only suggestion that comes to mind is that double-check the java version you have active when launching that command, i.e. before you run your Python command on the command line, check what java -version says (should say the usual "1.8..."). Often any Java errors are a result of wrong Java version being used (and sometimes it sneakily changes on peoples' systems).

If the Java version does still check out ("1.8...."), I am loss at help to give :(. Either the installation is corrupted (reinstallation could help), or this is a new bug I have not seen before. If the latter case, having information on your system (OS, Python version, exact Java version, etc) would give little help, but unfortunately I won't be able to help much beyond that.