turbopape / postagga

A Library to parse natural language in pure Clojure and ClojureScript
MIT License
159 stars 16 forks source link

Feature Request: OR logic for rule steps #13

Closed milt closed 6 years ago

milt commented 7 years ago

Thanks for the great work so far on this, I'm having a blast trying it out. One thing I'd love to see is a way to express that one of a group of steps is required, rather than just a list of optional steps.

For instance, I have this working rule:

(def rules [{:id :statement
             :optional-steps []
             :rule [:actor
                    #{:get-value
                      #{"NN"} #{"NNP"} ;; singular, singular proper
                      #{"NNS"} #{"NNPS"}} ;; plural, plural proper
                    :verb
                    #{:get-value #{"VBD"}}
                    :object
                    #{:get-value
                      #{"NN"} #{"NNP"}
                      #{"NNS"} #{"NNPS"}}
                    #{#{"."}}
                    ]}])

"bob experienced enlightenment." => {:errors nil, :result {:rule :statement, :data {:actor ["bob"], :verb ["experienced"], :object ["enlightenment"]}}}

"people experienced enlightenment." => {:errors nil, :result {:rule :statement, :data {:actor ["people"], :verb ["experienced"], :object ["enlightenment"]}}}

note that some of the POS tags are different than the postagga defaults, I'm plugging in the Stanford NLP lib for tagging

This rule finds simple statements of actor - verb - object, where the actor and object can both be singular or plural. But it isn't possible to disambiguate between singular and plural in the result map, meaning that if I want to look specifically for singular or plural actors or objects, I need to either make both optional (which could result in a false hit with no actor/object), or branch and write more rules to cover all cases (singular actor - verb - singular object, plural actor - verb - singular object etc...) which would get pretty huge.

It would be great if I could specify something like an OR condition, either in the optional steps vector or the body of the rule, for example:


(def desired-rules
  [{:id :statement
    :optional-steps [;; grouping of two or more steps to indicate
                     ;; that one of them is required
                     [:actor-singular
                      :actor-plural]
                     ]
    :rule [:actor-singular
           #{:multi :get-value
             #{"NN"} #{"NNP"}} ;; singular, singular proper
           :actor-plural
           #{:multi :get-value
             #{"NNS"} #{"NNPS"}} ;; plural, plural proper
           :verb
           #{:get-value #{"VBD"}}
           :object
           #{:get-value
             #{"NN"} #{"NNP"}
             #{"NNS"} #{"NNPS"}}
           #{#{"."}}
           ]}])

"bob experienced enlightenment." => {:errors nil, :result {:rule :statement, :data {:actor-singular ["bob"], :verb ["experienced"], :object ["enlightenment"]}}}

"people experienced enlightenment." => {:errors nil, :result {:rule :statement, :data {:actor-plural ["people"], :verb ["experienced"], :object ["enlightenment"]}}}

Would something like this make sense to add? Is there some other property of the rule syntax that can achieve this already? Happy to contribute to make it happen.

turbopape commented 7 years ago

Hey Thanks for using postagga ! I am really happy that you could plug in another tagger, means that the design is modular enough for you to use the parser :) Well what you're suggesting makes perfect sense. I am aware that the mere list of steps to follow isn't expressive enough, as it only describes one linear branch of parsing: this then this then.... I totally agree that we should be able to add "forking" as you suggest, so we can have a proper "automaton" that parses our "grammars". But I would not put this in "optional steps", actually I am going to ditch the "optional steps" vector altogether, and put a "grammar" kindof matrix telling possible ways to go from one step: That'll do or in particular and branching in more general way: for instance :step1 -> step2 step3 step3 -> ... I am gonna defintely work on this. But can you fork the project and add an example of how did you plug the stanford tagger please ? Mind sharing the model / pos-tagger ? Cheers Rafik

turbopape commented 7 years ago

Or I can simply specify vector of steps like so:

          [:actor-singular
           #{:multi :get-value
             #{"NN"} #{"NNP"}} ;; singular, singular proper
           :actor-plural
           #{:multi :get-value
             #{"NNS"} #{"NNPS"}}] ;; plural, plural proper

To say this Or this. I'll probably begin this way as it would take less time to implement. Of course, I'll keep the optional vector in this method. Thanks you for sharing your thoughts. Really appreciate it!

turbopape commented 6 years ago

I can help anywone willing to work on this for #Hacktoberfest !!

turbopape commented 6 years ago

Hey @milt ! I added the :!OR! operator to account for your request ! can you plz consider testing?