huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
244 stars 80 forks source link

IX: Select languages for metadata extractor #7479

Open gabriel-piles opened 1 week ago

gabriel-piles commented 1 week ago

When instantiating a metadata extractor, a language selection mechanism should be incorporated to allow for targeted processing of specific languages for the metadata extractor. So only PDFs in that languages are used.

Users may opt to process all languages, as it is working right now, a single language, or multiple languages.