Open ojak opened 9 years ago
The best way would be to build a custom dictionary and search/replace for the specific words. Currently, you're using the default tokenizer (which is :lingua). You could also try with alternate taggers (:brill or :stanford). The specifics of each tokenizer are abstracted away from the interface, so "Adding a word to the parsing dictionary" dictionary would require creating a base class for each tagger (https://github.com/louismullie/treat/tree/master/lib/treat/workers/lexicalizers/taggers) that would handle an :override_tags option and plugging it into the initialize methods of the child classes.
OK, thanks. I'll look into that approach and let you know how it goes.
There are words that are missing or mis-identified by the language parser. Is there a way to add a word to the parsing dictionary? If not, what would be the best way to handle such cases?
For example, with default settings, the word _spicy_ is tagged as
FW
(foreign word):