opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
254 stars 69 forks source link

Segmentation of PDF to single pages #64

Closed opensemanticsearch closed 6 years ago

opensemanticsearch commented 6 years ago

Open Source the ETL component for segmentation of massive PDFs to single pages

Mandalka commented 6 years ago

Done by ETL plugin enhance_pdf_page.py