Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Make ETL runtime stats more granular / for each plugin (analog to runtime stats of text extraction by Tika), so we can easier analyze stats for each plugin / tool / enrichment / enhancement stage.
Implemented separated runtime stat for each plugin to integer fields etl_pluginname_time_millis_i which are shown in UI in preview tab "Document processing (ETL)"
Make ETL runtime stats more granular / for each plugin (analog to runtime stats of text extraction by Tika), so we can easier analyze stats for each plugin / tool / enrichment / enhancement stage.