Devise data structure (AST) for the parsed results of user-supplied commands

oubiwann commented 5 years ago

This is an interesting problem to solve, and one where I suspect the first iteration will be quite naïve. I suspect:

The verb will be the root node of the tree
Objects in the world will support protocols (a la Java interfaces), and thus will be able to say whether they support the given action indicated by the verb
Deeper structure will only be revealed through experimentation

oubiwann commented 5 years ago

One thing that will need to be decided upon is which sentence type to parse. The user commands can either be interpreted as imperative statements: "pick up the book" or as implicit 1st person narrative: "I pick up the book". In either case, OpenNLP parses them differently.

Compare:

{:chunked ({:phrase ["I"] :tag "NP"}
           {:phrase ["pick"] :tag "VP"}
           {:phrase ["up"] :tag "PRT"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["I" "PRP"]
          ["pick" "VBP"]
          ["up" "RP"]
          ["the" "DT"]
          ["book" "NN"]
          ["." "."])}

with:

{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["Pick" "VBG"] 
          ["up" "IN"] 
          ["the" "DT"] 
          ["book" "NN"] 
          ["." "."])}

Then, contrast these with the following:

{:chunked ({:phrase ["I"] :tag "NP"}
           {:phrase ["pick"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["up"] :tag "ADVP"})
 :tagged (["I" "PRP"]
          ["pick" "VBP"]
          ["the" "DT"]
          ["book" "NN"]
          ["up" "RB"]
          ["." "."])}

{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["up"] :tag "ADVP"})
 :tagged (["Pick" "VBG"] 
          ["the" "DT"] 
          ["book" "NN"] 
          ["up" "RB"] 
          ["." "."])}

Tagged part-of-speech references:

oubiwann commented 5 years ago

We are going to want to capitalize, though:

{:chunked ({:phrase ["pick"] :tag "VP"}
           {:phrase ["up"] :tag "ADVP"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["pick" "VB"] ["up" "RP"] ["the" "DT"] ["book" "NN"] ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["Pick" "VBG"] ["up" "IN"] ["the" "DT"] ["book" "NN"] ["." "."])}

oubiwann commented 5 years ago

These might serve as a good starting place for thinking about converting a parsed sentence into an AST:

[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book from the table.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["from"] :tag "PP"}
           {:phrase ["the" "table"] :tag "NP"})
 :tagged (["Pick" "VBG"]
          ["up" "IN"]
          ["the" "DT"]
          ["book" "NN"]
          ["from" "IN"]
          ["the" "DT"]
          ["table" "NN"]
          ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book off of the floor.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["off"] :tag "PP"}
           {:phrase ["of"] :tag "PP"}
           {:phrase ["the" "floor"] :tag "NP"})
 :tagged (["Pick" "VBG"]
          ["up" "IN"]
          ["the" "DT"]
          ["book" "NN"]
          ["off" "IN"]
          ["of" "IN"]
          ["the" "DT"]
          ["floor" "NN"]
          ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book from the shelf.")
{:chunked ({:phrase ["Take"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["from"] :tag "PP"}
           {:phrase ["the" "shelf"] :tag "NP"})
 :tagged (["Take" "VB"]
          ["the" "DT"]
          ["book" "NN"]
          ["from" "IN"]
          ["the" "DT"]
          ["shelf" "NN"]
          ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book out of the open chest.")
{:chunked ({:phrase ["Take"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["out"] :tag "PP"}
           {:phrase ["of"] :tag "PP"}
           {:phrase ["the" "open" "chest"] :tag "NP"})
 :tagged (["Take" "VB"]
          ["the" "DT"]
          ["book" "NN"]
          ["out" "IN"]
          ["of" "IN"]
          ["the" "DT"]
          ["open" "JJ"]
          ["chest" "NN"]
          ["." "."])}

oubiwann commented 5 years ago

I think a good place to start might be just keeping it simple: strip out everything but nouns and verbs:

this assumes there's only one verb
the first noun indicates the object of the action
any additional nouns help set that noun in context (i.e., helping to uniquely identify the noun in question

oubiwann commented 5 years ago

Compare the simplified versions:

[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book from the table." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["table" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book off of the floor." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["floor" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book from the shelf." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["shelf" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book out of the open chest." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["chest" "NN"])

oubiwann commented 5 years ago

Parse output now looks like this:

{:ast {:action "take" :object "book" :relations ("chest")}
 :chunked ({:phrase ["I"] :tag "NP"}
           {:phrase ["take"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["out"] :tag "PP"}
           {:phrase ["of"] :tag "PP"}
           {:phrase ["the" "open" "chest"] :tag "NP"})
 :tagged (["I" "PRP"]
          ["take" "VBP"]
          ["the" "DT"]
          ["book" "NN"]
          ["out" "IN"]
          ["of" "IN"]
          ["the" "DT"]
          ["open" "JJ"]
          ["chest" "NN"]
          ["." "."])
 :tokens ["I" "take" "the" "book" "out" "of" "the" "open" "chest" "."]}

hexagram30 / language

Devise data structure (AST) for the parsed results of user-supplied commands #37