karthikncode / text-world-player

Framework and model code for the paper "Language Understanding for Text-based Games using Deep Reinforcement Learning", EMNLP 2015
http://arxiv.org/pdf/1506.08941v2.pdf
MIT License
127 stars 32 forks source link

Can you explain how do you deal with illegal actions? #6

Open euwern opened 6 years ago

euwern commented 6 years ago

In your paper, you mentioned that the action scorer module spits out two outputs (one action ("go", "eat"), and one object("east", "apple"). I wonder how does your architecture deals with illegal action such as the following: given a state s, the possible actions are: a1: eat apple a2: go east However the action scorer will score all possible word in the action ("go", "eat") and objects ("east", "apple"). which results in 4 possible actions a1: eat apple (legal action) --> score: 0.9 a2: go east (legal action) --> score: 0.08 a3: eat east (illegal action) --> score: 0.01 a4: go apple (illegal action) --> score: 0.01

In such scenario how does your architecture deals with illegal actions? do you just look up the table for only legal actions?