typesense / typesense-docsearch-scraper

A fork of Algolia's awesome DocSearch Scraper, customized to index data in Typesense (an open source alternative to Algolia)
https://typesense.org/docs/guide/docsearch.html
Other
101 stars 36 forks source link

Unable to install dependencies when using scraper image in CircleCI #27

Open lanegoolsby opened 1 year ago

lanegoolsby commented 1 year ago

Description

It appears that after yesterday's change any attempt to install dependencies onto the scraper image fails due to lack of permissions.

In order for us to use the crawler we have to install some our of custom CA certs. Otherwise the crawler will not trust the certificates our internal sites use and cannot therefore connect via HTTPS.

We haven't changed anything on our end for many months and run several crawls a day. We ran several crawls yesterday and they all worked. Suddenly today we're seeing this error in CircleCI. image

Steps to reproduce

  1. Create a Dockerfile
FROM typesense/docsearch-scraper
RUN apt-get update && apt-get install -y git openssh-client
# RUN curl -sSLk https://[our internal site]/install-certs | bash -
# RUN cat /usr/local/share/ca-certificates/* > /usr/local/share/ca_bundle.pem
# ENV REQUESTS_CA_BUNDLE "/usr/local/share/ca_bundle.pem"

You'll see this: image

I have tried running the command with sudo but it fails as well.

lanegoolsby commented 1 year ago

I'm starting to think the whole image is corrupt.

I was able to work around the permission issue by setting the docker image to run as root. That allowed the deps to be installed. But when it came to run the crawl this error is being thrown:

Creating a virtualenv for this project...
Pipfile: /root/Pipfile
Using /usr/bin/python3 (3.10.6) to create virtualenv...
 ating virtual environment... ating virtual environment... ating virtual environment... ating virtual environment... ating virtual environment... ating virtual environment... ating virtual environment... ating virtual environment... ating virtual environment... ating virtual environment...  Creating virtual environment...created virtual environment CPython3.10.6.final.0-64 in 680ms
  creator CPython3Posix(dest=/root/.local/share/virtualenvs/root-BuDEOXnJ, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==23.0, setuptools==67.1.0, wheel==0.38.4
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

Successfully created virtual environment!
  Creating virtual environment...
Virtualenv location: /root/.local/share/virtualenvs/root-BuDEOXnJ
Creating a Pipfile for this project...
/root/.local/share/virtualenvs/root-BuDEOXnJ/bin/python: Error while finding module specification for 'src.index' (ModuleNotFoundError: No module named 'src')

Exited with code exit status 1
lanegoolsby commented 1 year ago

FYI - in the process of working through #28 I found a spot where we were setting the working directory to /app. I changed that to be ~/app and while it didn't solve the problem it did show a error that closer jives with my earlier screen shots.

image

jasonbosco commented 1 year ago

May I know what version of Docker engine you're using?

lanegoolsby commented 1 year ago

Locally I am using Rancher Desktop. docker -v generates: Docker version 20.10.21-rd, build ac29474. I'm not sure what version of Docker is being used by CircleCI but it should be fairly recent.

lanegoolsby commented 1 year ago

I just tried pulling latest and got a slightly different error. Figured I'd share in case its helpful.

sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper
sudo: a password is required
The command '/bin/sh -c curl -sSLk https://url.to.a/script | bash -' returned a non-zero code: 1