opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
254 stars 69 forks source link

Spacy NER text size limit: Segmented NER of longer text #83

Open opensemanticsearch opened 5 years ago

opensemanticsearch commented 5 years ago

Spacy NER text size limit is one million chars.

If longer extracted plain text for NER it should be segmented with separete Spacy NER call for each segment.

Mandalka commented 1 year ago

Solved by https://github.com/opensemanticsearch/spacy-services/pull/3 Tnx!

Todo: Config documentation of new env variable SPACY_MAX_LENGTH