[IX] Add support for "empty" being the labeled data

huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections

http://www.uwazi.io

MIT License

244 stars 80 forks source link

[IX] Add support for "empty" being the labeled data #6979

Open aphilop opened 5 months ago

aphilop commented 5 months ago

This should be displayed in the Suggestions column and also in the Stats & Filter Panel.

FIX BUG: When accepting a SELECT type suggestion, if the suggestion is empty the server returns an error Id is invalid: name_of_the_tesauri because it tries to find the suggested value, which is nothing in the thesauri. This bug can now only happen when bulk accepting.

The UI should allow to individually accept SELECT, TEXT and NUMERIC suggestions with empty values.

txau commented 5 months ago

In the stats and filters this becomes a labeled data. So:

When a row in the table is labeled as "empty" and the model is returning "empty" it will be a "Match".
When a row in the table is labeled as "empty" and the model is returning suggestions it will be a "Mismatch"

So I guess we don't need to modify the stats UI, but just make sure that the backend is placing the right data in the right place.

In the table, maybe since the option to accept a suggestion exists in the table, we should also have the option to accept an "empty" value as the correct value and labeled data. For the sake of simplicity we could skip this change for now and allow to set the "empty" value only from the side panel.

Where we definitely need to be able to set the "empty" value is in the side panel.

juanmnl commented 4 months ago

@txau given that both can have an empty state, i guess the only way to filter through them is by adding a nested filter to Match and Mismatch (opening the possibility for us adding more options of granular filtering) but will also mean adding a "partial" state for the checkboxes and maybe a "collapse/expand" action to the main one at some point.

txau commented 4 months ago

I think we don't need to modify the filters. They are ok as they are, if the property is LABELED as empty, it may match or mismatch, we don't need to differentiate the empty status in filters.

What we need is a way to set and check this value in the form.

It was suggested to have an extra option in the multiselect that is a fixed option saying "empty". But this is obfuscated. I think we need a more obvious and quick way for handling the empty value.

juanmnl commented 4 months ago

We are adding an action to the bottom bar of the hub when viewing a PDF, that will allow the users to quickly label the whole document as [empty].

If any values are selected (checkbox) this action should deselect all.

When the action has been triggered, the button will turn inactive and a message appears explaining that the document has been [labeled as empty], until a new value is selected.

Designs

gabriel-piles commented 3 months ago

This change also affects the contract with the service. We want to be explicit about whether a value is empty and whether the suggested prediction is empty. My suggestion for passing this information back and forth to the service is as follows:

LabeledData:

tenant: str = ""
id: str = ""
xml_file_name: str = ""
entity_name: str = ""
language_iso: str = ""

--> empty_value: bool = False

label_text: str = ""
values: list[Option] = list()
source_text: str = ""
page_width: float = 0
page_height: float = 0
xml_segments_boxes: list[SegmentBox] = list()
label_segments_boxes: list[SegmentBox] = list()

Suggestion:

tenant: str
id: str
xml_file_name: str = ""
entity_name: str = ""

--> empty_suggestion: bool = False

text: str = ""
values: list[Option] = list()
segment_text: str = ""
page_number: int = 1
segments_boxes: list[SegmentBox] = list()

txau commented 2 months ago

@gabriel-piles I think this contract is accept by the backenders