Open Mandalka opened 5 years ago
Implemented additional file indexing queue with lower priority and additional plugins like OCR by setting additional_plugins_later so reindexing with additioan plugins like OCR done later, after all documents indexed faster without OCR before.
Todo: UI option for that in Web Admin config UI, so no need for editing ETL config by editor.
Implemented config option additional_plugins_later_config so we can not only add additonal plugins but reconfigure yet runned plugins.
So we can disable Tika's OCR option on first run and enable Tikas OCR option on second run of same plugin, too.
Added UI option in Web Admin config UI.
REST-API (used by file monitoring) for file indexing now using index_filedirectory for single files, too, which adds the file to multiple different priorized queues, if option like OCR later is on.
Todo for OCR by Web page importer
Implemented UI to prioritize certain files for OCR by https://github.com/opensemanticsearch/open-semantic-search/issues/251
Do automatic textrecognition (OCR) for (embedded) images (many analysis time for few additional text) later and in background, so most other data/documents or most parts of the documents will be searchable many times faster.