Closed lanegoolsby closed 1 year ago
Yes definitely. I had this on my todo list to tackle later, but later never came... until now unfortunately.
I've now updated the release process to release tagged images going forward.
Meanwhile, I tried building the 0.3.4
code base, but it looks like there have been some deprecations, so I had to make some small tweaks to get it to build again. I've pushed this out as typesense/docsearch-scraper:0.3.5
. Could you give that a shot and let me know if it works as expected?
Now I am getting HTTP timeouts. I verified out cluster is up and healthy.
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.6/http/client.py", line 1377, in getresponse
response.begin()
File "/usr/lib/python3.6/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.6/http/client.py", line 281, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.6/ssl.py", line 1012, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.6/ssl.py", line 874, in read
return self._sslobj.read(len, buffer)
File "/usr/lib/python3.6/ssl.py", line 631, in read
v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 447, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/urllib3/connectionpool.py", line 337, in _raise_timeout
self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='our.typensense.url.com, port=443): Read timed out. (read timeout=3.0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/src/index.py", line 116, in <module>
run_config(environ['CONFIG'])
File "/root/src/index.py", line 43, in run_config
typesense_helper.create_tmp_collection()
File "/root/src/typesense_helper.py", line 30, in create_tmp_collection
self.typesense_client.collections[self.collection_name_tmp].delete()
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/collection.py", line 22, in delete
return self.api_call.delete(self._endpoint_path())
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 159, in delete
params=params, timeout=self.config.connection_timeout_seconds)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 129, in make_request
raise last_exception
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 103, in make_request
r = fn(url, headers={ApiCall.API_KEY_HEADER_NAME: self.config.api_key}, **kwargs)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/api.py", line 161, in delete
return request('delete', url, **kwargs)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='our.typensense.url.com, port=443): Read timed out. (read timeout=3.0)
Exited with code exit status 1
Could you exec bash into the container and try curling your Typesense host's health endpoint? This feels like some sort of docker network setup issue...
I get a 301 if I curl from the CircleCI host.
(note: tsUrl
is just a temporary env var I created that is equal to our.typensense.url.com
above)
I am able to hit the health endpoint from my local machine.
Everything was working yesterday so this too is almost certainly related to the image.
That's so strange! Wonder where the 301 is coming from.
Could you share the output of curl -svo /dev/null $tsUrl
?
lol, forgot https://, mea culpa.
I can curl from the CircleCI pod
Would you be able share a minimal CircleCI configuration that replicates the issue? I'm curious to see how the scraper image gets referenced, if it's using a machine executor or a docker executor, how the networking is setup, etc.
Also, do you use the scraper image directly, or do you build another image based off of the scraper image and use that?
(I still haven't ruled out that this is not an issue with the image - just want to be able to replicate it consistently on my side to be able to understand the root cause).
We have the crawler built into a CircleCI orb so I'm translating it to a 'normal' job as best I can. It may require some tweaking.
executors:
typesense:
docker:
- image: $hubUrl/typesense/docsearch-scraper #:0.3.5
commands:
crawl:
parameters:
apiKey:
type: env_var_name
description: Environment variable for the Typesense key
default: TYPESENSE_API_KEY
config:
type: string
description: Path to a JSON file that tells the crawler how to parse a site's structure.
env:
type: enum
default: np
enum: ["np", "prod"]
description: Typensense environment, defaults to "np" (non-prod)
steps:
- run:
name: Install dependencies
command: |
apt-get update && apt-get install -y git openssh-client
- run:
name: Crawl the site
command: |
export TYPESENSE_HOST=$([[ << parameters.env >> = "prod" ]] && echo "prodUrl" || echo "nonProdUrl")
export TYPESENSE_API_KEY="$<< parameters.apiKey >>"
export CONFIG=$(cat << parameters.config >>)
cd /root
pipenv run python -m src.index
jobs:
crawl_nonprod:
executor: typesense
steps:
- crawl:
apiKey: superSecure
config: /path/to/yadda.json
env: np
Okay, after a bit of futzing and cussing I was able to get things working on my end with the 3.5 rollback!
I think Circle was holding on to a previous attempt of me trying to use sudo
for the apt install
or something.
The issue in #27 still persists but I can at least unblock my pipelines now!
Phew glad to hear that! 😅
I'll keep you posted on the other one.
Several of our deployment pipelines are currently hung because of #27. However we can't roll back to the previous version because only the latest version of the scraper is published to Docker.
Please adjust the deployment process so that historical versions are persisted. That way consumers can roll back in the event of a problem.