huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
244 stars 80 forks source link

[IX] Support text fields as source when creating an extractor #7157

Open aphilop opened 3 months ago

aphilop commented 3 months ago

Is your feature request related to a problem? Please describe. The information extraction feature currently works with PDF documents as source. We want to expand the sources also to text fields within a template so that information can also be extracted from these text fields.

Describe the solution you'd like The dialogue of creating a new extractor should be expanded to display available text fields as sources for the information extraction. An extractor can have as source either PDF documents or text fields.

@juanmnl will add the discussed designs and description of the flow for this enhancement.

juanmnl commented 3 months ago

We are just adding new common sources to the "create extractor" modal, if available. This sources are now selectable through a radio selector as only one source is selectable at once.

add extractor - template - sources

Designs

RafaPolit commented 3 months ago

Things to keep in mind while developing this: