Closed Garstig closed 4 years ago
Hi @Garstig , The builtin entities rely on builtin parsers which indeed have a deterministic behaviors. However, these parsers are not directly producing the output of the Snips NLU engine. The core machine learning algorithm which extracts entities, uses the builtin entity parsers to provide features but at this stage they are only features. What it means is that if a builtin entity is detected by the underlying builtin parser, but it doesn't correspond to any slot in the sentence, the NLU algorithm will be able to understand that and correctly discard it. In practice, these features are very powerful (which is what we want) and help a lot in the extraction of builtin entities.
Parsing errors on builtin entities may have two causes:
If you want to make sure that you have properly labelled the builtin entities, you can check what is detected by the underlying builtin entity parser by running the following:
>>> from snips_nlu.entity_parser import BuiltinEntiyParser
>>> parser = BuiltinEntityParser.build(language="de")
>>> parser.parse("wo muss ich nach dem mittag hin")
[{'value': 'mittag', 'resolved_value': {'kind': 'InstantTime', 'value': '2020-01-16 12:00:00 +01:00', 'grain': 'Hour', 'precision': 'Exact'}, 'entity_kind': 'snips/datetime', 'range': {'start': 21, 'end': 27}}]
And adjust the training data accordingly if you have some mistmatches.
I hope this helps. Best
Hi @adrienball!
Thanks a lot for your help again!
Are any other features used for builtin entities? I don't think so since in #863 I asked if builtin entities are expandable and you said no. Just want to make sure, if I understood you correctly.
Do you have a documentation that explains all the features you used for slot filling? In the config file you can see all standard used features but I don't understand all of them just by reading their names. Hope this is not too off topic for this issue.
Bests Garstig
The features used in the CRF are documented here.
Builtin entities, extracted by Snips NLU, are resolved (datetimes are returned with rich content for instance) and for this reason it is necessary that they correspond to something parsed by the underlying builtin entity parsers, however as I said before in the thread it is not sufficient. This means that if a string is not parsed by one of our builtin entity parsers, it won't be output by the Snips NLU engine. Best
Thank you very much! This explains a lot.
Have a nice weekend :)
Hi,
for a project I validated my slot extraction with a 10-fold cross validation. For some reason the results of the builtin entities are not deterministic. Shouldn't they be, since they are rule based?
Example:
wo muss ich nach dem mittag hin
. Here I marked "mittag" assnips/datetime
. In 1 out of 10 times it was identified correctly.Also the f1 score I calculated for the
snips/datetime
slot increased from 75% to 80% when I used synonyms for some other slots.Am I missing something?