louismullie / treat

Natural language processing framework for Ruby.
Other
1.37k stars 127 forks source link

Custom Innovators for name_tag #90

Closed nazarhussain closed 9 years ago

nazarhussain commented 9 years ago

Hi,

I am working on a scenario where this great library suites perfect. The problem I am facing is the limited set of named tags for Named Entity tags. I have to extract different information like dates, national id numbers, etc...

Can you suggest me the best approach that I can extend this library and add my own logic to tag information...

Any material or example would be helpful for me and I will appreciate your effort.

louismullie commented 9 years ago

Hi,

The best way would be to use custom models for the Stanford NER system: http://nlp.stanford.edu/software/crf-faq.shtml#a

You would then need to configure the Stanford CoreNLP gem appropriately to point to the new models.

An alternative would be to use custom models with the OpenNLP library (https://github.com/louismullie/open-nlp) tagger. Unfortunately, there is no integration for Treat right now. You are welcome to push one if you are interested. It should not be hard to do, simply needs to replicate the behaviour of the existing Stanford named entity recognizer (https://github.com/louismullie/treat/blob/master/lib/treat/workers/extractors/name_tag/stanford.rb), but with OpenNLP.

I will close this, but let me know if you have any issues.

Louis