IX: Select languages for metadata extractor

huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections

http://www.uwazi.io

MIT License

244 stars 80 forks source link

IX: Select languages for metadata extractor #7479

Open gabriel-piles opened 1 week ago

gabriel-piles commented 1 week ago

When instantiating a metadata extractor, a language selection mechanism should be incorporated to allow for targeted processing of specific languages for the metadata extractor. So only PDFs in that languages are used.

Users may opt to process all languages, as it is working right now, a single language, or multiple languages.