typesense / typesense-docsearch-scraper

A fork of Algolia's awesome DocSearch Scraper, customized to index data in Typesense (an open source alternative to Algolia)
https://typesense.org/docs/guide/docsearch.html
Other
97 stars 36 forks source link

Docker scrape fails with x-typesense-api-key header issue #8

Closed jbremmer closed 2 years ago

jbremmer commented 2 years ago

Description

Scraper does not start, crashes with Forbidden - a validx-typesense-api-keyheader must be sent.

Steps to reproduce

run with docker command:

docker run -it --env-file=./.env -e "CONFIG=$(cat ./config.json | jq -r tostring)" typesense/docsearch-scraper

where ./.env is

TYPESENSE_API_KEY=2...f
TYPESENSE_HOST=<host>
TYPESENSE_PORT=80
TYPESENSE_PROTOCOL=http

where ./config.json is

{
    "index_name": "typesense_docs",
    "start_urls": [
        {
            "url": "https://typesense.org/docs/(?P<version>.*?)/",
            "variables": {
                "version": [
                    "0.22.2",
                    "0.22.1",
                    "0.22.0",
                    "0.21.0",
                    "0.20.0",
                    "0.19.0",
                    "0.18.0",
                    "0.17.0",
                    "0.16.1",
                    "0.16.0",
                    "0.15.0",
                    "0.14.0",
                    "0.13.0",
                    "0.12.0",
                    "0.11.2"
                ]
            }
        }
    ],
    "selectors": {
        "default": {
            "lvl0": ".content__default h1",
            "lvl1": ".content__default h2",
            "lvl2": ".content__default h3",
            "lvl3": ".content__default h4",
            "lvl4": ".content__default h5",
            "text": ".content__default p, .content__default ul li, .content__default table tbody tr"
        }
    },
    "scrape_start_urls": false,
    "strip_chars": " .,;:#"
}

Expected Behavior

Run's and starts scrape.

Actual Behavior

crashes with:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/src/index.py", line 116, in <module>
    run_config(environ['CONFIG'])
  File "/root/src/index.py", line 43, in run_config
    typesense_helper.create_tmp_collection()
  File "/root/src/typesense_helper.py", line 29, in create_tmp_collection
    self.typesense_client.collections[self.collection_name_tmp].delete()
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/collection.py", line 22, in delete
    return self.api_call.delete(self._endpoint_path())
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 159, in delete
    params=params, timeout=self.config.connection_timeout_seconds)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 115, in make_request
    raise ApiCall.get_exception(r.status_code)(r.status_code, error_message)
typesense.exceptions.RequestUnauthorized: [Errno 401] Forbidden - a valid `x-typesense-api-key` header must be sent.

Metadata

0.22.2:

OS: N/A:

jasonbosco commented 2 years ago

Could you do a GET /keys on the Typesense server to make sure this API key exists on the server?

jbremmer commented 2 years ago

Installing key correctly fixed the issue, ty @jasonbosco