Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Extracting text using Tika fails (error 400) unless line 142 is commented out in enhance_extract_text_tika_server.py:
I don't understand why that fixes it; perhaps it's something to do with the latest Python packages and Tika (verison 2.1.0)? Running on Ubuntu 20.