typesense / typesense-docsearch-scraper

A fork of Algolia's awesome DocSearch Scraper, customized to index data in Typesense (an open source alternative to Algolia)
https://typesense.org/docs/guide/docsearch.html
Other
97 stars 36 forks source link

Failed to establish a connection for localhost #6

Closed artt closed 2 years ago

artt commented 2 years ago

Description

I started up a Typesense server using steps outlined here with docker command:

export TYPESENSE_API_KEY=xyz
mkdir /tmp/typesense-data

docker run -p 8108:8108 -v/tmp/typesense-data:/data typesense/typesense:0.22.1 \
  --data-dir /data --api-key=$TYPESENSE_API_KEY --enable-cors

Healthcheck curl http://localhost:8108/health returns ok.
I then run the scraper:

docker run -it --env-file=.env -e \"CONFIG=$(cat ./docsearch.json | jq -r tostring)\" typesense/docsearch-scraper

with the following in my .env file:

TYPESENSE_API_KEY=xyz
TYPESENSE_HOST=localhost
TYPESENSE_PORT=8108
TYPESENSE_PROTOCOL=http

I get the following:

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connection.py", line 170, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connection.py", line 234, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/lib/python3.6/http/client.py", line 1281, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1327, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1276, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1042, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 980, in send
    self.connect()
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connection.py", line 200, in connect
    conn = self._new_conn()
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x40092e5550>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8108): Max retries exceeded with url: /collections/th_1643792349 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x40092e5550>: Failed to establish a new connection: [Errno 111] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/src/index.py", line 116, in <module>
    run_config(environ['CONFIG'])
  File "/root/src/index.py", line 43, in run_config
    typesense_helper.create_tmp_collection()
  File "/root/src/typesense_helper.py", line 29, in create_tmp_collection
    self.typesense_client.collections[self.collection_name_tmp].delete()
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/collection.py", line 22, in delete
    return self.api_call.delete(self._endpoint_path())
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 159, in delete
    params=params, timeout=self.config.connection_timeout_seconds)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 129, in make_request
    raise last_exception
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 103, in make_request
    r = fn(url, headers={ApiCall.API_KEY_HEADER_NAME: self.config.api_key}, **kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/api.py", line 161, in delete
    return request('delete', url, **kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8108): Max retries exceeded with url: /collections/th_1643792349 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x40092e5550>: Failed to establish a new connection: [Errno 111] Connection refused',))

Metadata

Typsense Version: 0.22.1

OS: mac osX 12 on Apple Silicon

Note: I can run Algolia's DocSearch (to Algolia's server) fine.

jasonbosco commented 2 years ago

@artt I built the Typesense docsearch docker image on an Intel machine. That's the reason you see that issue on an ARM machine. Could you try running it under emulation using Rosetta2 as described here: https://stackoverflow.com/a/67680194/123545

augustluhrs commented 2 years ago

Hi, I'm running into the same issue, it doesn't list the same platform error, but I tried running it with the emulation flag anyway and still got the same error log.

I have a suspicion that it has something to do with not having set up a collection correctly? Just confused because the installation guide doesn't mention anything about that. I tried using the api and the typesense-cli tool to create an empty collection using the same name as the index in my config.json file, but that didn't change the error. What's the proper way to set up a collection so that the scraper has the correct url to connect to (what I'm guessing is the error here)?

Thanks!

docker run -it --platform linux/amd64 --env-file=.env -e "CONFIG=$(cat devportal.config.json | jq -r tostring)" typesense/docsearch-scraper
Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connection.py", line 170, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connection.py", line 234, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/lib/python3.6/http/client.py", line 1281, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1327, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1276, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1042, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 980, in send
    self.connect()
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connection.py", line 200, in connect
    conn = self._new_conn()
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connection.py", line 182, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fe0deae5390>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8108): Max retries exceeded with url: /collections/devportal_1644520986 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe0deae5390>: Failed to establish a new connection: [Errno 111] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/src/index.py", line 116, in <module>
    run_config(environ['CONFIG'])
  File "/root/src/index.py", line 43, in run_config
    typesense_helper.create_tmp_collection()
  File "/root/src/typesense_helper.py", line 29, in create_tmp_collection
    self.typesense_client.collections[self.collection_name_tmp].delete()
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/collection.py", line 22, in delete
    return self.api_call.delete(self._endpoint_path())
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 159, in delete
    params=params, timeout=self.config.connection_timeout_seconds)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 129, in make_request
    raise last_exception
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 103, in make_request
    r = fn(url, headers={ApiCall.API_KEY_HEADER_NAME: self.config.api_key}, **kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/api.py", line 161, in delete
    return request('delete', url, **kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8108): Max retries exceeded with url: /collections/devportal_1644520986 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe0deae5390>: Failed to establish a new connection: [Errno 111] Connection refused',))
jasonbosco commented 2 years ago

Ahhh, I just realized this: when you use Docker to run the scraper and have TYPESENSE_HOST=localhost in your .env file the scraper running inside the Docker container looks for Typesense to be running inside that same container (localhost). That's why it's unable to connect.

So you want to use a hostname or IP address for TYPESENSE_HOST that refers to the Typesense process that is running outside the docker container - localhost will not work. More context here: https://stackoverflow.com/questions/24319662/from-inside-of-a-docker-container-how-do-i-connect-to-the-localhost-of-the-mach

jasonbosco commented 2 years ago

Separately, @augustluhrs - the scraper takes care of creating the collection for you. So you don't have to pre-create it.

augustluhrs commented 2 years ago

thanks @jasonbosco, super good to know! that sent me down a rabbit hole of investigation. I'm unfortunately running on a mac so adding the --network="host" flag to the typesense server won't work, so instead I changed TYPESENSE_HOST to host.docker.internal and that seemed to do the trick.

After that I ran into the same error as #8 , which was strange because I had definitely confirmed the existence of the api key I was using, but after checking it it was gone. Not sure if it expired or if I misunderstood how the server stores api-keys. Either way, creating a new admin key and updating the scraper got it working.

Thanks so much!