Description

We want users, in this case most likely myself and any developers whoever, to benchmark an NLU data set for entity extraction and be able to refine those entities to improve the data set.

Making sure the “human in the for loop” flow works comes from refining the entities, however there will be improvements that need to be made as they block the refinement process. This is a dummy ticket to append such minor code fixes to.

User stories

As a user, I want to

visually see the analytics of the entities so that I know what needs to be improved
review all the incorrect entities so that I can fix them

Sounds easy, right? Especially since we have already built the intent refinement work flow, but it is a bit more complex than that.

With intents; we could visualize all domains, see where the intents are doing the worst, pick those domains, and then review all the incorrectly classified intents in that domain for refinement. With entities, it is a bit more tricky.

We'll do this, but with entities. However, we need to group together the entities in a domain, and there will also be overlap. Some utterances have more than one entity type. So, we have to keep track of that. Furthermore, do we tell the user to refine all entities in an utterance, or do we tell them to ignore them? It would be super annoying to have to go back over the same utterances 2 or more times! This is why we should have users working on multiple entities at the same time. This is harder for a user to do, as the user must know if each one is correct and if not, what they should be.

Ergo, it is better for a user to review incorrect entries in batches. They should have an overview for that domain of example entries where the entities are correct, then go through correcting no more than 100 at a time.

This means, however, we will have to adapt our flow from the intent refinement. With intent refinement, we recorded into CSVs by domain and intent. Here we will just do it by domain in batches, then merge those together into one for the whole domain. If the user is lucky, they will only have to do one batch per domain.

DoD

[x] benchmark entities over whole data set
[x] graph analysis of entities for the whole data set
[x] benchmark entities per domain
[x] graphs of entities per domain
[ ] add incorrect_entities_report to macro_entities_refinement.py
[x] ipysheet refinement of a batch in the domain
[x] save to a CSV of batches
[x] merge with CSV for the whole domain
[x] merge with the CSV for the whole data set
[ ] benchmark again

secretsauceai / NLU-engine-prototype-benchmarks

Flow for entity refinement #4

Description

User stories

DoD