opensemanticsearch / open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
https://opensemanticsearch.org
GNU General Public License v3.0
978 stars 169 forks source link

bug regarding the mapping of file paths? #456

Open josefkarlkraus opened 1 year ago

josefkarlkraus commented 1 year ago

Hey,

first of all, amazing piece of work that you've done! While indexing our server we've probably encountered a bug regarding the mapping of file paths:

Our current mapping ist defind in /etc/opensemanticsearch/connector-files, with the aim to make search results directly accessible through a apache server: config['mappings'] = { "/mnt/server/": "http://192.168.2.20/server/" } Additionally we made a ln -s /mnt/server /var/www/html/server This works great so far for nearly every file.

But for any file (like .pdf) inside E-Mails (.msg) or E-Mail archives (*.pst), which means for every E-Mail attachment, the mapping results in: http://192.168.2.20/mnt/server/... instead of: http://192.168.2.20/server/...

So, the question is, did we make any mistake regarding the configuration or is there a bug regarding attachments?

Best regards Josef

josefkarlkraus commented 1 year ago

PS: the same mapping bug occurs for example with jpg-files inside xls-files