Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
If additional plugin like OCR was enabled, reindex/ETL again files so the analysis of the additional plugin will be applied to yet indexed documents.