Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Move list of technical metadata fieldnames like pdf:docinfo:creator_tool and so on from plugin enhance_multilingual to separated blacklist config, so reusable by other plugins like plugin core method gettext()
Move list of technical metadata fieldnames like pdf:docinfo:creator_tool and so on from plugin enhance_multilingual to separated blacklist config, so reusable by other plugins like plugin core method gettext()