huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
241 stars 80 forks source link

[IX] Extracted metadata in Files persists after Extractor deletion #7139

Closed RafaPolit closed 2 months ago

RafaPolit commented 2 months ago

Extracted metadata in Files persists after Extractor deletion

When I delete an extractor, all suggestions for that extractor are deleted, but each File (that was labeled or not) still has metadata values with selection rectangles for a particular text. Upon creating news extractors, this data is reporting as already labeled. Additionally, files extracted metadata does not hold any information regarding to which extractor it belongs.

txau commented 2 months ago

This is working as intended. Keep in mind that the "click-to-fill" feature is not just for machine learning but also for regular entity edition. The selection rectangle belongs to the property, not to the extractor. It is expected to keep the selection rectangles for that property and that it is recovered as already labeled data if the extractor is recreated.

RafaPolit commented 2 months ago

Ok, I need to confirm that the File's extracted metadata is bound to a particular property. If that is the case, I think I agree that this is working as expected.