Lucy is an Entity Recognition engine which defines a simple yaml syntax for recognizing entities in text.
Lucy's core concept is that:
Entities are cascading patterns of tokens and entities.
Lucy focuses on capturing your knowledge by focusing on token patterns.
Token patterns are a natural way of building a model, because it allows you to focus on the fragments of information you understand in way that is instantly testable and naturally builds up to more complex entities.
Lucy has 2 core concepts:
When authoring you want immediate feedback on the impact of your changes. Lucy acheives this by not requiring any training or even CLI tool to be invoked to see the results of your changes.
Traditional ML systems require you to "invent" many permutations of sentence structure and the laboriously label them for the system to understand your patterns.
Lucy focuses on a syntax which needs minimal changes to affect your desired outcome and maximizes the reuse.
The Lucy engine can be run on it's own and work perfectly fine, but it can also be coupled with existing LU engines such as LUIS or Orchestrator to great benefit.
There are 2 factors controlling language support in Lucy.
Lucy uses off the shelf Lucene token analyzers to tokenize text into tokens. There are currently token analyzers for 29 lanuages:
The Microsoft.Text.Recognizers libraries provide support for core entity types:
Microsoft.Text.Recognizers support 15 languages
See Microsoft.Text.Recognizers for a complete listing of languages and support.