Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Spacy NER text size limit is one million chars.
If longer extracted plain text for NER it should be segmented with separete Spacy NER call for each segment.