Closed dbose closed 10 years ago
@dbose Thanks! I've never personally done any work with lemmatization. Do you think PEGs would be a good fit for it?
P.S. Closing this since it isn't really an issue.
Definitely, PEGs are not good for such NLP processes.
I think I phrased it incorrectly. What I meant - Is there a way to match only for lemmatized word. I handled it in following way (it's an hack, only taking care of adjective forms) -
rule pre_modifier_token modifier ('d' | 'ed' | 'ped')* end
For example in pyparsing
, I can hook in custom function into the PEG (https://github.com/JoshRosen/cmps140_creative_cooking_assistant/blob/master/nlu/ingredient_line_grammar.py; LemmatizedWord
is a custom function)
Another way would be to build Lemmatization
and other IE capabilities on top of the PEG. But it would have been excellent to hook custom functions into the stream.
By the way, I'm using citrus
in extracting data out of recipes and it's looking great so far. As the domain vocabulary of cooking is rather limited, a ML-based extractor would have been overkill. Thanks again for your work.
I would love to contribute on this (bringing it closer to pyparsing
et. al.), and raise a pull-request with my thoughts on what I meant by custom functions.
Cheers Deb
Ah, thanks for the explanation.
I think you'll probably want to look into subclassing Citrus::Nonterminal
to achieve what you're describing. A non-terminal is able to do custom logic that describes matching behavior of other rules. In your case, it sounds like you could possibly create a non-terminal that looks for lemmatization and matches (or doesn't) based on that.
In any case, I'd definitely be interested in seeing a PR that implements this.
First of all, awesome work !! I think
citrus
is very close topyparsing
.Any idea how I can implement a custom parsing
Rule
, let's say forlemmatization
?-Cheers Deb