Closed theconnectionist closed 8 years ago
The only model creation methods in MITIE are the ones documented in the example programs. So you would need to create a new dataset that contained the union of all the entities you wanted to deal with and train on that.
You also seem to be asking if MITIE supports user generated features like "is in my dictionary". There is no API for that since I found gazateers to not make much of a difference in accuracy and they complicate the user workflow. Although the C++ code for running MITIE isn't that complicated so you could add your own additional features by editing it if you wanted to. At the end of the day MITIE is just a simple application of this dlib tool, which is fully documented. So it's easy to modify.
But I wouldn't worry about that. The thing to do is make a single unified training dataset that captures what you want to do and train a model based on that dataset.
Hi Davis,
Thank you so much for this high performance open source library. I have one question that I couldn't find an answer to wrt training the entity recognizer.
I would like to take advantage of already known entities, but also be able to recognize entities not already known to the dictionary. For e.g. the wikidata project provides millions of entities and it would be nice to seed the model with those known entities. Couple of approaches I can think of: