opensemanticsearch / open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
https://opensemanticsearch.org
GNU General Public License v3.0
977 stars 169 forks source link

Import bibliographic metadata #106

Open trenkert opened 6 years ago

trenkert commented 6 years ago

some of the documents I organize in opensemanticsearch have bibliographic metadata associated with them. Could you include a plugin or function for searching and importing metadata? (perhaps similar to how zotero does it?)

This metadata could be useful for stuff like

Mandalka commented 6 years ago

The Open Semantic ETL architecture can do such things by data enrichment so most things are implementable with few lines of Python, if metadata is in an open standard (and/or good Python libraries to read the data) if there is an ID in metadata that is mappable to the ID/URL/filename of document.

For example renaming can be done yet on file name / URI / URL by the default plugin for mapping file names / directories to other paths or web server URLs.

Which format is the meta data?

trenkert commented 6 years ago

Which format is the meta data?

Programs like Zotero, Mendeley or cb2bib fetch metadata for pdfs that appear to be (scientific) publications from google scholar and store them as bibtex, for instance. (if you want to see it in action, just drag and drop a couple of pdfs into Zotero).

OKFN are maintaining BibJson, an implementation of Bibtex as JSON: http://okfnlabs.org/bibjson/ This should make it possible to store and share bibtex metadata on a server and to render it in the UI as well as make it exportable (either raw or as different citation styles).