slimtoolkit / slim

Slim(toolkit): Don't change anything in your container image and minify it by up to 30x (and for compiled languages even more) making it secure too! (free and open source)
Apache License 2.0
19.27k stars 719 forks source link

How to enable dynamic HTTP requests? #158

Closed openedhardware closed 3 years ago

openedhardware commented 4 years ago

I have a python container that downloads files from various remote URLs including github & AWS S3.

Here is my build command:

docker-slim build --http-probe=false --include-path /usr/local/lib/python3.7/dist-packages/certifi --show-clogs object-detector

I had to manually include the certification path which is used by requests package.

And used the following command to start the slimified container:

docker run --privileged --rm -d --network host object-detector.slim

But getting this error

Error processing line 1 of /usr/local/lib/python3.7/dist-packages/protobuf-3.12.2-py3.7-nspkg.pth:

  Traceback (most recent call last):
    File "/usr/lib/python3.7/site.py", line 174, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 580, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
[setupvars.sh] OpenVINO environment initialized
Error processing line 1 of /usr/local/lib/python3.7/dist-packages/protobuf-3.12.2-py3.7-nspkg.pth:

  Traceback (most recent call last):
    File "/usr/lib/python3.7/site.py", line 174, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 580, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.7/dist-packages/urllib3/util/connection.py", line 61, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
OSError: [Errno 16] Device or resource busy

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 381, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 976, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connection.py", line 308, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connection.py", line 172, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f9440b91110>: Failed to establish a new connection: [Errno 16] Device or resource busy

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 725, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.7/dist-packages/urllib3/util/retry.py", line 439, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /pjreddie/darknet/master/data/coco.names (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f9440b91110>: Failed to establish a new connection: [Errno 16] Device or resource busy'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/src/app/src/main.py", line 62, in run
    model_url=model_url if self.config.get('public_model', True) is False else None)
  File "/usr/src/app/src/utils/models.py", line 160, in download_model
    r = requests.get(COCO_LABELS_URL)
  File "/usr/local/lib/python3.7/dist-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /pjreddie/darknet/master/data/coco.names (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f9440b91110>: Failed to establish a new connection: [Errno 16] Device or resource busy'))

It downloads raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names which is configured as a constant value.

What do I have to do to resolve OSError: [Errno 16] Device or resource busy issue?

Cheers.

kcq commented 4 years ago

Thank you @openedhardware for providing all this background info! This will be super helpful trying to repro the condition.

A couple of extra questions... Do you use the multiprocessing or threading packages by any chance? Where do you store the downloaded files? Is it local files to the container? Is it stored to a volume or maybe some kind of network file system?

Also curious... why did you do docker run with --privileged and --network host?

openedhardware commented 4 years ago

@kcq

Thanks for your quick reply!

  1. Yeah, I am using threading in many classes, and the download URL is defined in a python file as a constant:

    COCO_LABELS_URL = "https://raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names"

    But some other URLs are created dynamically.. e.g.

    S3_BASE_URL = "https://public-storage.s3.eu-central-1.amazonaws.com/OD/{}/{}.tar.xz"
    download_url = S3_BASE_URL.format(first_variable, second_variable)
  2. Downloaded files are stored at the volume mounted directory. Could be a problem?

  3. --privileged is used to access to the webcam and --network host is used for the local node-red server.

Best Regards

kcq commented 4 years ago

Thanks for the clarifications @openedhardware ! One more question :) What base image are you using?

openedhardware commented 4 years ago

@kcq

FROM python:3.7-slim-buster

USER root

RUN apt-get update && apt-get install -y build-essential cmake apt-transport-https ca-certificates curl libgomp1 usbutils gnupg2 python3.7-dev wget

# OpenCV & Dlib dependencies
RUN apt-get install -y libgtk2.0-dev libgtk-3-dev libboost-all-dev

RUN pip3 install -U pip setuptools

WORKDIR /usr/src/app

COPY . ./

RUN pip3 install -r requirements.txt

CMD ["python3", "/usr/src/app/src/main.py"]

This is my Dockerfile.

Thanks!

openedhardware commented 4 years ago

@kcq Here is the content of the requirements.txt fie:

redis
getmac
netifaces
requests
numpy
networkx==2.3
paho-mqtt
opencv-contrib-python
dlib
PyYAML
flake8
tensorflow==1.13.1

The protobuf package(from tensorflow) doesn't seem to be installed correctly?

kcq commented 4 years ago

@openedhardware thank you! curious... how are you using tensorflow and opencv?

kcq commented 4 years ago

working on a basic repro app... i'll keep expanding until it breaks :)

kcq commented 4 years ago

yes, something is going on with the protobuf package... it's most likely related to its non-python components

openedhardware commented 4 years ago

Hi, @kcq

Thanks for your help!

Yeah, non-python components are suffering.... Had to spend a day figuring out the requests library issue..

Opencv & TF: I am normally using them to detect a person from the webcam.

Cheers

kcq commented 4 years ago

@openedhardware what's the relationship between the logic that downloads the files and opencv/TF?

openedhardware commented 4 years ago

That is to download the trained model files and label files. @kcq

I think docker-slim should work with any HTTP API calls?

I do have another container that scraps a bunch of websites, so couldn't specify how many URLs will be used as it downloads the target list from our server as well.

Is there any restriction? I am a newbie to docker-slim, so I have no idea how it works internally.

Thanks for your help! 👍

openedhardware commented 4 years ago

@kcq

I have submitted a new issue for this issue as it failed to download a static file - https://github.com/docker-slim/docker-slim/issues/159

Thanks!