Closed oubiwann closed 5 years ago
One thing that will need to be decided upon is which sentence type to parse. The user commands can either be interpreted as imperative statements: "pick up the book" or as implicit 1st person narrative: "I pick up the book". In either case, OpenNLP parses them differently.
Compare:
{:chunked ({:phrase ["I"] :tag "NP"}
{:phrase ["pick"] :tag "VP"}
{:phrase ["up"] :tag "PRT"}
{:phrase ["the" "book"] :tag "NP"})
:tagged (["I" "PRP"]
["pick" "VBP"]
["up" "RP"]
["the" "DT"]
["book" "NN"]
["." "."])}
with:
{:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["up"] :tag "SBAR"}
{:phrase ["the" "book"] :tag "NP"})
:tagged (["Pick" "VBG"]
["up" "IN"]
["the" "DT"]
["book" "NN"]
["." "."])}
Then, contrast these with the following:
{:chunked ({:phrase ["I"] :tag "NP"}
{:phrase ["pick"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["up"] :tag "ADVP"})
:tagged (["I" "PRP"]
["pick" "VBP"]
["the" "DT"]
["book" "NN"]
["up" "RB"]
["." "."])}
{:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["up"] :tag "ADVP"})
:tagged (["Pick" "VBG"]
["the" "DT"]
["book" "NN"]
["up" "RB"]
["." "."])}
Tagged part-of-speech references:
We are going to want to capitalize, though:
{:chunked ({:phrase ["pick"] :tag "VP"}
{:phrase ["up"] :tag "ADVP"}
{:phrase ["the" "book"] :tag "NP"})
:tagged (["pick" "VB"] ["up" "RP"] ["the" "DT"] ["book" "NN"] ["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["up"] :tag "SBAR"}
{:phrase ["the" "book"] :tag "NP"})
:tagged (["Pick" "VBG"] ["up" "IN"] ["the" "DT"] ["book" "NN"] ["." "."])}
These might serve as a good starting place for thinking about converting a parsed sentence into an AST:
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book from the table.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["up"] :tag "SBAR"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["from"] :tag "PP"}
{:phrase ["the" "table"] :tag "NP"})
:tagged (["Pick" "VBG"]
["up" "IN"]
["the" "DT"]
["book" "NN"]
["from" "IN"]
["the" "DT"]
["table" "NN"]
["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Pick up the book off of the floor.")
{:chunked ({:phrase ["Pick"] :tag "VP"}
{:phrase ["up"] :tag "SBAR"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["off"] :tag "PP"}
{:phrase ["of"] :tag "PP"}
{:phrase ["the" "floor"] :tag "NP"})
:tagged (["Pick" "VBG"]
["up" "IN"]
["the" "DT"]
["book" "NN"]
["off" "IN"]
["of" "IN"]
["the" "DT"]
["floor" "NN"]
["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book from the shelf.")
{:chunked ({:phrase ["Take"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["from"] :tag "PP"}
{:phrase ["the" "shelf"] :tag "NP"})
:tagged (["Take" "VB"]
["the" "DT"]
["book" "NN"]
["from" "IN"]
["the" "DT"]
["shelf" "NN"]
["." "."])}
[hxgm30.language.repl] λ=> (nlp/parse "Take the book out of the open chest.")
{:chunked ({:phrase ["Take"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["out"] :tag "PP"}
{:phrase ["of"] :tag "PP"}
{:phrase ["the" "open" "chest"] :tag "NP"})
:tagged (["Take" "VB"]
["the" "DT"]
["book" "NN"]
["out" "IN"]
["of" "IN"]
["the" "DT"]
["open" "JJ"]
["chest" "NN"]
["." "."])}
I think a good place to start might be just keeping it simple: strip out everything but nouns and verbs:
Compare the simplified versions:
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book from the table." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["table" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Pick up the book off of the floor." {:simple? true}))
(["Pick" "VBG"] ["book" "NN"] ["floor" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book from the shelf." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["shelf" "NN"])
[hxgm30.language.repl] λ=> (:tagged (nlp/parse "Take the book out of the open chest." {:simple? true}))
(["Take" "VB"] ["book" "NN"] ["chest" "NN"])
Parse output now looks like this:
{:ast {:action "take" :object "book" :relations ("chest")}
:chunked ({:phrase ["I"] :tag "NP"}
{:phrase ["take"] :tag "VP"}
{:phrase ["the" "book"] :tag "NP"}
{:phrase ["out"] :tag "PP"}
{:phrase ["of"] :tag "PP"}
{:phrase ["the" "open" "chest"] :tag "NP"})
:tagged (["I" "PRP"]
["take" "VBP"]
["the" "DT"]
["book" "NN"]
["out" "IN"]
["of" "IN"]
["the" "DT"]
["open" "JJ"]
["chest" "NN"]
["." "."])
:tokens ["I" "take" "the" "book" "out" "of" "the" "open" "chest" "."]}
This is an interesting problem to solve, and one where I suspect the first iteration will be quite naïve. I suspect: