microsoft / TextWorld

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.
Other
1.23k stars 189 forks source link

Accessing oracle policy commands for tw-cooking games #272

Closed vmicheli closed 3 years ago

vmicheli commented 3 years ago

Hey,

I'm unable to access oracle policy commands for tw-cooking games which were introduced a couple of months ago: https://github.com/microsoft/TextWorld/pull/261

I generated a game with:

tw-make tw-cooking --recipe 3 --take 3 --cook --cut --open --go 12 --split train --output tw_games/tw-game.z8 --seed 11985

and tried to play it with:

tw-play --hint tw_games/tw-game.z8

but oracle policy commands are not displayed.

Am I doing something wrong with the game generation or the play command?

MarcCote commented 3 years ago

Weird! I just tried your two commands on the master branch and get:

Oracle: [0/11|(0): take red hot pepper > go east > open screen door > cook red hot pepper with BBQ > go east > go north > take red apple from counter > cook red apple with stove > take knife from counter > slice red apple with knife > chop red hot pepper with knife > open fridge > take white onion from fridge > go south > go west > cook white onion with BBQ > chop white onion with knife > go east > go north > prepare meal > eat meal]

Edit: master is in sync with 1.4.3

vmicheli commented 3 years ago

I just tried the commands again and it works now. That's even weirder ahah.

Anyways it is time for some long-range language modeling, I'll let you know if I get interesting results!

MarcCote commented 3 years ago

Ok. Let me know if that happens again. Maybe there's some stochastic bug hidden in the oracle's trajectory computation! Also, as you can see the oracle assumes initial knowledge of the recipe (couldn't find a workaround yet).

Something I just thought while writing this is we could split the game in two:

MarcCote commented 3 years ago

Hmm, also something I just noted with the oracle's trajectory above. There's no examine cookbook :(

vmicheli commented 3 years ago

At the moment I'm doing the data collection by playing the game myself with the assistance of the oracle. Hopefully only a few tens of demonstrations are necessary before moving on to RL.

But if we wanted to automate the data collection, then as you pointed out the agent would first need to find the cookbook (task 1), examine it and proceed (task 2).