opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
254 stars 69 forks source link

Option to disable automatic reindexing if configured new/addtional plugin #105

Closed opensemanticsearch closed 4 years ago

opensemanticsearch commented 4 years ago

Admins should be able to disable automatic reindexing/reprocessing files on file system directories recrawl if new/additional configured plugin.

Mandalka commented 4 years ago

Implemented new config option do_not_reindex_because_plugin_yet_not_processed (array of plugin names)

Example setting for not automatically reindex yet indexed files because of setup (new default settings) of new plugins for email address and phone number extraction:

config['do_not_reindex_because_plugin_yet_not_processed'] = ['enhance_extract_email', 'enhance_extract_phone']