microsoft / TextWorld

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.
Other
1.2k stars 187 forks source link

Game freezing / general robustness to wrapped game engine #161

Open vzhong opened 5 years ago

vzhong commented 5 years ago

I've noticed that when randomly sampling actions in real text games (e.g. ZMachine games not generated by TextWorld), the game inevitably freezes/segfaults at some point (It seems like Jericho has the same issue). Since TextWorld is a RL environment for text games, can we make it more robust to these problems? For example, having a timeout failure or game crashed failure?

Here's a code snippet that consistently freezes on my machine:

# !mkdir -p games
# !wget -q http://www.ifarchive.org/if-archive/games/zcode/Balances.z5 -O games/Balances.z5

import textworld
import itertools
import tqdm
import numpy as np

np.random.seed(0)
episodes = 50
episode_len = 500

scores = []

for episode in range(episodes):
    print('episode {}'.format(episode))
    env = textworld.start('./games/Balances.z5')
    env.reset()
    env.seed(0)

    verbs = [w.word for w in env._jericho.get_dictionary() if w.is_verb]
    nouns = [w.word for w in env._jericho.get_dictionary() if w.is_noun]
    actions = [' '.join(tup) for tup in list(itertools.product(verbs, nouns))]

    for step in tqdm.trange(episode_len):
        act = np.random.choice(actions)
        print(step, act)
        game_state, score, done = env.step(act)
        if done:
            break
    scores.append(score)

print(scores)
MarcCote commented 5 years ago

Thank you for reporting this issue. I'm not surprised Jericho would have the same issue since TextWorld uses Jericho under the hood. If I remember correctly, I think @mhauskn is aware of that issue but I don't know if we had a solution for it?

mhauskn commented 5 years ago

Here's an attempt to recreate the issue in Jericho alone:

from jericho import *
import itertools
import tqdm
import numpy as np

np.random.seed(0)
episodes = 50
episode_len = 500
ROM='/home/mahauskn/workspace/text-agents/roms/balances.z5'

scores = []

for episode in range(episodes):
    print('episode {}'.format(episode))
    env = FrotzEnv(ROM, 0)
    env.reset()

    verbs = [w.word for w in env.get_dictionary() if w.is_verb]
    nouns = [w.word for w in env.get_dictionary() if w.is_noun]
    actions = [' '.join(tup) for tup in list(itertools.product(verbs, nouns))]

    for step in tqdm.trange(episode_len):
        act = np.random.choice(actions)
        print(step, act)
        game_state, score, done, _ = env.step(act)
        if done:
            break
    scores.append(score)

print(scores)

Running this script I observe a Fatal error:

episode 0
  0%|                                                                                                                                                                                                                                                              | 0/500 [00:00<?, ?it/s]0 burn chasm
1 taste them
2 swim valley
3 turn floor
4 leave scrolls
5 put buck-too
6 ride barker
7 swallow helistars
8 l seven
9 unlock gold
10 fondle winged
11 full x
12 wait gorse
13 sniff every
14 flip square
15 hear sixteen
16 ride ne
17 insert lighted
18 switch twelve
19 x s
20 murder nw
21 shed northeast
  4%|██████████▋                                                                                                                                                                                                                                         | 22/500 [00:00<00:02, 214.07it/s]22 hear both
23 pick carpet
24 smash carpet
25 cut eastepisode 0
  0%|                                                                                                                                                                                                                                                              | 0/500 [00:00<?, ?it/s]0 burn chasm
1 taste them
2 swim valley
3 turn floor
4 leave scrolls
5 put buck-too
6 ride barker
7 swallow helistars
8 l seven
9 unlock gold
10 fondle winged
11 full x
12 wait gorse
13 sniff every
14 flip square
15 hear sixteen
16 ride ne
17 insert lighted
18 switch twelve
19 x s
20 murder nw
21 shed northeast
  4%|██████████▋                                                                                                                                                                                                                                         | 22/500 [00:00<00:02, 214.07it/s]22 hear both
23 pick carpet
24 smash carpet
25 cut east
26 slice grimoire
27 bother then
28 climb scroll
29 sing n
30 destroy chasm
31 i daffodils
32 blow pile
33 describe an
34 brief comma,
35 verbose shiny
36 no self
37 switch great
38 restore feather
39 say then
40 attach eight
41 wear floor
42 c,cast cedarwood
43 full door
44 noscript bazaar
45 go pans
  9%|██████████████████████▍                                                                                                                                                                                                                             | 46/500 [00:00<00:02, 220.78it/s]46 inside bozbar
47 straddle her

Fatal error: Illegal object
26 slice grimoire
27 bother then
28 climb scroll
29 sing n
30 destroy chasm
31 i daffodils
32 blow pile
33 describe an
34 brief comma,
35 verbose shiny
36 no self
37 switch great
38 restore feather
39 say then
40 attach eight
41 wear floor
42 c,cast cedarwood
43 full door
44 noscript bazaar
45 go pans
  9%|██████████████████████▍                                                                                                                                                                                                                             | 46/500 [00:00<00:02, 220.78it/s]46 inside bozbar
47 straddle her

Fatal error: Illegal object

Is this consistent with what you're seeing @vzhong?

mhauskn commented 5 years ago

I've verified that the same behavior is present in the Frotz (the base emulator used by Jericho) and have filed an issue there (https://gitlab.com/DavidGriffith/frotz/issues/111).

It's a valid point that RL environments shouldn't hang or produce fatal errors for what should be valid actions.

vzhong commented 5 years ago

Hey @mhauskn , there are a variety of ways in which these failures can manifest. I haven't seen Fatal error: Illegal object from Jericho before, but I have seen segfaults.

I'm not even certain that the problem lies in Frotz. We are wrapping environments (textworld) around environments (Jericho) around environments (Frotz) around environments (ZMachine). There are probably fatal bugs in ZMachine (which is not written to support randomly behaving RL agents), and it's probably overly optimistic to hope to get those fixed. The problem for TextWorld users is that this kills the entire Python stack, which is not ideal for RL. It's probably useful to do something on the TextWorld end to handle these failure cases.

One simulator-agnostic way is to wrap the simulator (e.g. Jericho or Frotz) in a separate process, such that one can issue time-out-attached commands to the simulator. The simulator can fail/time out, but this does not affect TextWorld. When these events happen, TextWorld can just tell the user that the simulator died and the user can decide what they want to do (e.g. discard the trajectory, assign a negative reward for killing the simulator).

mhauskn commented 5 years ago

I've updated Jericho (now version 1.2.0) to avoid the Python stack crashing when a game encounters a fatal error. Instead the episode will be reported as over and the emulator status will be marked as halted.

This change should address the issue with Balances, but will not fix games that hang or segfault. Running your script across all supported Jericho games revealed 2 hangs and 3 fatal errors. The 3 fatal errors should be addressed by this update, but for now I'd advise avoiding the showverb command that causes hangs in the other 2 games. Please open issues with Jericho if you encounter other hangs or segfaults.

vzhong commented 5 years ago

Thank you @mhauskn !! I'll take a look soon

MarcCote commented 5 years ago

Thanks @mhauskn, now I'll need to think what's the best way to expose that information in TextWorld.

Regarding the hangs, what @vzhong proposes (wrapping a subprocess) is probably the safest way to do it but the drawback is the overhead over inter-multiprocess communications. If I remember correctly Jericho's main focus is speed.

That said, in TextWorld we are already using subprocesses that to handle communications with the git-glulx interpreter, so we could also do it for Jericho.

mhauskn commented 5 years ago

I'm avoiding implementing subprocess wrapping in Jericho for now. My hope is that segfaults + hangs can be fixed. If fixing segfaults/hangs turns out not to be feasible, then I may reconsider in the future. That said, I totally understand if TW wants to subprocess-wrap Jericho for added safety.