microsoft / malmo

Project Malmo is a platform for Artificial Intelligence experimentation and research built on top of Minecraft. We aim to inspire a new generation of research into challenging new problems presented by this unique environment. --- For installation instructions, scroll down to *Getting Started* below, or visit the project page for more information:
https://www.microsoft.com/en-us/research/project/project-malmo/
MIT License
4.07k stars 600 forks source link

Client sends MALMOOK, but mission never starts #236

Open lydiatliu opened 8 years ago

lydiatliu commented 8 years ago

I get DEBUG: Looking for client, received reply from 10.190.104.80: MALMOOK and keep calling peekWorldState(), but world_state.is_mission_running never evaluates to True

timhutton commented 8 years ago

Suggestion: add world_state.mission_started flag, to help debug whether mission ever started.

Suggestion: have agent_host print out something useful when mission control messages are received.

Suggestion: have agent_host ping the Mod to find out whether still running.

DaveyBiggers commented 8 years ago

I have a hunch this might be a thread deadlock issue caused by MissionRecordingSpec reuse, possibly related to #256.

DaveyBiggers commented 8 years ago

Might be changing my mind... could this just be another case #118?

timhutton commented 8 years ago

With the 0.17.0 release, we can investigate this issue more:

Reduced to P2 since there's a workaround: to put a timeout on waiting for the mission to start.

DaveyBiggers commented 8 years ago

Closing this as we can no longer reproduce it.

DaveyBiggers commented 7 years ago

I believe I've just reproduced this... It's possible to accidentally "queue up" missions, due to a race condition in the client state machine. What happens is this:

  1. Agent A tries to start a mission.
  2. Dormant Minecraft client receives the MissionInit XML, returns MALMOOK, and begins processing the XML. The client only leaves the "dormant" state when it has finished parsing the XML.
  3. Agent A begins waiting for the mission to start.
  4. Agent B tries to send a mission.
  5. The Minecraft client is still processing the first MissionInit XML message, and is therefore still in the dormant state, so it believes it can accept this new mission too, and sends the MALMOOK response.
  6. Agent B begins waiting for the mission to start.

Agent B's mission will eventually start, but only after A's has finished.

The more involved the XML, the longer it will take to process, so the more chance there is of catching the mod while it's in the dormant state. I've managed to create a queue of around ten missions. Obviously the tenth agent will be kept waiting while nine other missions get run.

In a single client case this is maybe not so bad, since the tenth agent will always have to wait its turn... but if there are, say, ten clients in the client pool, all agents could run simultaneously. Instead, nine clients will be sitting dormant while the first client does all the work in serial.