opensemanticsearch / open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
https://opensemanticsearch.org
GNU General Public License v3.0
957 stars 166 forks source link

Web crawl options #95

Open scaleoutsean opened 6 years ago

scaleoutsean commented 6 years ago

Issues:

1) Web crawler refuses to crawl sites with invalid certs, which ideally should be settable through the CLI or settings, but it's not. 2) /usr/bin/opensemanticsearch-index-web refers to a non-existing settings file:

# Do not edit config here! Overwrite options in /etc/opensemanticsearch/connector-web

$ ll /etc/opensemanticsearch/connector-web
ls: cannot access '/etc/opensemanticsearch/connector-web': No such file or directory
gsmvenus commented 5 years ago

Same issue here, cannot access '/etc/opensemanticsearch/connector-web': No such file or directory

This file is missing

mmoossen commented 2 years ago
  1. this has nothing to do with OSS, but the OS. For instance in Debian: 1.1 copy your certificate in PEM-Format and with crt-Extension to /usr/local/share/ca-certificates/ 1.2 execute update-ca-certificates and restart
  2. it seems to be fixed in oss_21.01.03