hexagram30 / language

A syntagmata and Markov chain language, word, and name generator for use in hexagram30 narratives
3 stars 1 forks source link

Devise data structure (AST) for the parsed results of user-supplied commands #37

Closed oubiwann closed 5 years ago

oubiwann commented 5 years ago

This is an interesting problem to solve, and one where I suspect the first iteration will be quite naïve. I suspect:

  1. The verb will be the root node of the tree
  2. Objects in the world will support protocols (a la Java interfaces), and thus will be able to say whether they support the given action indicated by the verb
  3. Deeper structure will only be revealed through experimentation
oubiwann commented 5 years ago

One thing that will need to be decided upon is which sentence type to parse. The user commands can either be interpreted as imperative statements: "pick up the book" or as implicit 1st person narrative: "I pick up the book". In either case, OpenNLP parses them differently.

Compare:

{:chunked ({:phrase ["I"] :tag "NP"}
           {:phrase ["pick"] :tag "VP"}
           {:phrase ["up"] :tag "PRT"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["I" "PRP"]
          ["pick" "VBP"]
          ["up" "RP"]
          ["the" "DT"]
          ["book" "NN"]
          ["." "."])}

with:

{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["Pick" "VBG"] 
          ["up" "IN"] 
          ["the" "DT"] 
          ["book" "NN"] 
          ["." "."])}

Then, contrast these with the following:

{:chunked ({:phrase ["I"] :tag "NP"}
           {:phrase ["pick"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["up"] :tag "ADVP"})
 :tagged (["I" "PRP"]
          ["pick" "VBP"]
          ["the" "DT"]
          ["book" "NN"]
          ["up" "RB"]
          ["." "."])}
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["up"] :tag "ADVP"})
 :tagged (["Pick" "VBG"] 
          ["the" "DT"] 
          ["book" "NN"] 
          ["up" "RB"] 
          ["." "."])}

Tagged part-of-speech references:

oubiwann commented 5 years ago

We are going to want to capitalize, though:

{:chunked ({:phrase ["pick"] :tag "VP"}
           {:phrase ["up"] :tag "ADVP"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["pick" "VB"] ["up" "RP"] ["the" "DT"] ["book" "NN"] ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"})
 :tagged (["Pick" "VBG"] ["up" "IN"] ["the" "DT"] ["book" "NN"] ["." "."])}
oubiwann commented 5 years ago

These might serve as a good starting place for thinking about converting a parsed sentence into an AST:

[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book from the table.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["from"] :tag "PP"}
           {:phrase ["the" "table"] :tag "NP"})
 :tagged (["Pick" "VBG"]
          ["up" "IN"]
          ["the" "DT"]
          ["book" "NN"]
          ["from" "IN"]
          ["the" "DT"]
          ["table" "NN"]
          ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book off of the floor.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
           {:phrase ["up"] :tag "SBAR"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["off"] :tag "PP"}
           {:phrase ["of"] :tag "PP"}
           {:phrase ["the" "floor"] :tag "NP"})
 :tagged (["Pick" "VBG"]
          ["up" "IN"]
          ["the" "DT"]
          ["book" "NN"]
          ["off" "IN"]
          ["of" "IN"]
          ["the" "DT"]
          ["floor" "NN"]
          ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book from the shelf.")
{:chunked ({:phrase ["Take"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["from"] :tag "PP"}
           {:phrase ["the" "shelf"] :tag "NP"})
 :tagged (["Take" "VB"]
          ["the" "DT"]
          ["book" "NN"]
          ["from" "IN"]
          ["the" "DT"]
          ["shelf" "NN"]
          ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book out of the open chest.")
{:chunked ({:phrase ["Take"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["out"] :tag "PP"}
           {:phrase ["of"] :tag "PP"}
           {:phrase ["the" "open" "chest"] :tag "NP"})
 :tagged (["Take" "VB"]
          ["the" "DT"]
          ["book" "NN"]
          ["out" "IN"]
          ["of" "IN"]
          ["the" "DT"]
          ["open" "JJ"]
          ["chest" "NN"]
          ["." "."])}
oubiwann commented 5 years ago

I think a good place to start might be just keeping it simple: strip out everything but nouns and verbs:

oubiwann commented 5 years ago

Compare the simplified versions:

[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book from the table." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["table" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book off of the floor." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["floor" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book from the shelf." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["shelf" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book out of the open chest." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["chest" "NN"])
oubiwann commented 5 years ago

Parse output now looks like this:

{:ast {:action "take" :object "book" :relations ("chest")}
 :chunked ({:phrase ["I"] :tag "NP"}
           {:phrase ["take"] :tag "VP"}
           {:phrase ["the" "book"] :tag "NP"}
           {:phrase ["out"] :tag "PP"}
           {:phrase ["of"] :tag "PP"}
           {:phrase ["the" "open" "chest"] :tag "NP"})
 :tagged (["I" "PRP"]
          ["take" "VBP"]
          ["the" "DT"]
          ["book" "NN"]
          ["out" "IN"]
          ["of" "IN"]
          ["the" "DT"]
          ["open" "JJ"]
          ["chest" "NN"]
          ["." "."])
 :tokens ["I" "take" "the" "book" "out" "of" "the" "open" "chest" "."]}