opensemanticsearch / open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
https://opensemanticsearch.org
GNU General Public License v3.0
968 stars 169 forks source link

Preview: Table of content (TOC) / chapters / pages of documents / digitalized books #102

Open opensemanticsearch opened 6 years ago

opensemanticsearch commented 6 years ago

Preview: More fine granular like chapters and pages based navigation/filtering/browsing by table of content / chapters of document, for example of massive PDFs with hundreds of pages or digitalized Books.

opensemanticsearch commented 6 years ago

Any recommendations for an existing standard ontology for structure of books like table of contents / chapters?

Considering http://schema.org/Book and http://bib.schema.org/Chapter ...

Named Entities like persons, organizations and places will be analyzed by https://github.com/opensemanticsearch/open-semantic-entity-search-api & https://github.com/opensemanticsearch/open-semantic-search-apps where schema.org structure/properties are used for named entities.

opensemanticsearch commented 6 years ago

Add UI element for easier setup of ETL plugin for segmentation of PDFs to single pages by https://github.com/opensemanticsearch/open-semantic-etl/issues/64 by web config ui: https://github.com/opensemanticsearch/open-semantic-search/issues/121

Mandalka commented 6 years ago

Implemented new web config UI tab "Segmentation" to enable the ETL plugin to segment PDFs to single pages.