rdnfn / beobench

A toolkit providing easy and unified access to building control environments for reinforcement learning (RL).
https://beobench.readthedocs.io
MIT License
39 stars 4 forks source link

docker.errors.DockerException while running beobench run --config config.yaml #82

Closed HYDesmondLiu closed 2 years ago

HYDesmondLiu commented 2 years ago

First, thank you for sharing the codes. I was trying to run examples provided in the readme section. beobench run --config config.yaml using provided config.yaml and agent.py

However, I got this docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied')) error.

Please let me know if any further information is needed to debug.

Environments: Python 3.8.5 Ubuntu 18.04.3 LTS

Detailed message:

`Beobench ⚡️Starting experiment run ...
Beobench ⚡️Running experiment in container with environment MixedUseFanFCU-v0 and agent from ./agent.py. Sample 1 of 1.
Beobench ⚡️Recognised integration named sinergym.
Traceback (most recent call last):
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 950, in send
    self.connect()
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/transport/unixconn.py", line 30, in connect
    sock.connect(self.unix_socket)
PermissionError: [Errno 13] Permission denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 724, in urlopen
    retries = retries.increment(
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/urllib3/util/retry.py", line 403, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/home/hsinyu/anaconda3/lib/python3.8/http/client.py", line 950, in send
    self.connect()
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/transport/unixconn.py", line 30, in connect
    sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', PermissionError(13, 'Permission denied'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/api/client.py", line 214, in _retrieve_server_version
    return self.version(api_version=False)["ApiVersion"]
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/api/daemon.py", line 181, in version
    return self._result(self._get(url), json=True)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/api/client.py", line 237, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', PermissionError(13, 'Permission denied'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hsinyu/anaconda3/bin/beobench", line 8, in <module>
    sys.exit(cli())
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/beobench/cli.py", line 122, in run
    beobench.experiment.scheduler.run(
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/beobench/experiment/scheduler.py", line 172, in run
    _build_and_run_in_container(config)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/beobench/experiment/scheduler.py", line 202, in _build_and_run_in_container
    image_tag = beobench.experiment.containers.build_experiment_container(
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/beobench/experiment/containers.py", line 91, in build_experiment_container
    if not force_build and check_image_exists(stage2_image_tag):
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/beobench/experiment/containers.py", line 23, in check_image_exists
    client = docker.from_env()
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/client.py", line 96, in from_env
    return cls(
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/client.py", line 45, in __init__
    self.api = APIClient(*args, **kwargs)
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/api/client.py", line 197, in __init__
    self._version = self._retrieve_server_version()
  File "/home/hsinyu/anaconda3/lib/python3.8/site-packages/docker/api/client.py", line 221, in _retrieve_server_version
    raise DockerException(
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))`
rdnfn commented 2 years ago

Hi Desmond, thank you for trying Beobench and sharing the error!

It is hard to tell exactly why you get this error, my first guess would be that the your docker system may not quite be fully setup. Did you follow the Linux post-installation steps described here: https://beobench.readthedocs.io/en/latest/guides/installation_linux.html?

From that page, there would be two potential ways that might fix your docker permission issue:

  1. Always use sudo in front of beobench commands to grant the relevant privileges required for docker (note that this has not been tested)
  2. Recommended: follow the official post-installation steps to manage docker as a non-root user to enable running docker without sudo. As the linked documentation points out, this carries a certain security risk.

Did you try either of them?

Let me know if that helps or if you have any other questions/problems!

EDIT: I also just noticed that the output seems to indicate that you set the environment name config to be MixedUseFanFCU-v0, but the gym framework is set to sinergym. MixedUseFanFCU-v0 is an energym environment, I recommend changing the environment to a sinergym one like Eplus-5Zone-hot-continuous-v1. So for example having in your config.yaml file (Note that this issue is separate from the bug you shared):

env:
  # gym framework from which we want use an environment
  gym: sinergym
  # gym-specific environment configuration
  config:
    # sinergym environment name
    name: Eplus-5Zone-hot-continuous-v1
    # whether to normalise observations
    normalize: True
HYDesmondLiu commented 2 years ago

@rdnfn Thank you for the prompt reply. Previously indeed I was not in the docker user group. However, after being added to the group. I got another error.

And, I use the same agent.py and config.yamlas the examples provided in the readme page. The env. name is Eplus-5Zone-hot-continuous-v1. I am not sure where that environment is set.

Screen Shot 2022-06-30 at 9 46 14 AM Screen Shot 2022-06-30 at 9 49 24 AM
rdnfn commented 2 years ago

@HYDesmondLiu Glad to hear that the first error is fixed.

And you're absolutely right, the wrong env name in the log is not your fault but rather a bug in the logging code in Beobench (wouldn't affect your experiment itself though). Sorry about the confusion. I fixed it already in the dev/general branch and that fix will be added in the next release.

About the docker unkown shorthand flag error: I am having a difficult time to reproduce it. Would you be able to share your docker version? You can find out with the command docker --version. Thanks!

HYDesmondLiu commented 2 years ago

@rdnfn Thanks for the prompt reply. My docker version is listed below: Docker version 19.03.12, build 48a66213fe

rdnfn commented 2 years ago

Thanks for your patience and sharing the version! Good news, I have been able to reproduce the error now and I also hopefully have been able to fix it. Please update to the latest (new) Beobench version (v0.5.2) using pip install beobench --upgrade. Let me know if this update resolved your problem!

Background: The problem appears to be that your version of docker (v19.03) does not support the docker buildx subcommand of docker that Beobench (v0.5.1) uses by default to build experiment container images. This problem is not very easily visible because for some reason it gets hidden behind this unknown flag error (see output below). As buildx is not required for all use-cases, I have disabled it now where possible (if not on ARM64 architecture).

For future reference, output from the tests I ran:

/ # docker --version
Docker version 19.03.15, build 99e3ed8
/ # docker buildx build
docker: 'buildx' is not a docker command.
See 'docker --help'
/ # docker buildx build -t
unknown shorthand flag: 't' in -t
See 'docker --help'.

Usage:  docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Options:
      --config string      Location of client config files (default "/root/.docker")

    <more help output>

Run inside the following test container docker run -it --entrypoint /bin/sh docker:19.03-dind

HYDesmondLiu commented 2 years ago

Hi @rdnfn, Thanks for the quick fix, after upgrading to beobench v0.5.2, while running the same command beobench run --config config.yaml. I have got another error as shown in the snapshot below, something problems related to pip install.

Screen Shot 2022-07-02 at 8 26 04 AM
david-woelfle commented 2 years ago

Hi @rdnfn, Thanks for the quick fix, after upgrading to beobench v0.5.2, while running the same command beobench run --config config.yaml. I have got another error as shown in the snapshot below, something problems related to pip install.

Screen Shot 2022-07-02 at 8 26 04 AM

Can confirm, same problem here.

rdnfn commented 2 years ago

@david-woelfle and @HYDesmondLiu thanks for raising this problem! I am working on it ... will release a fix soon.

rdnfn commented 2 years ago

Thanks again @HYDesmondLiu and @david-woelfle for finding this error! Thanks for your patience!

I have now updated the development version of Beobench with a fix for this issue. You can install the latest development version using the command:

pip install git+https://github.com/rdnfn/beobench.git@dev/general

If you could try to install this and let me know if this resolves your error on your machines? If that's the case, I will publish a new version (v0.5.3) with this fix. Thanks so much for your help making Beobench better!

Background: I used a form of conditional statement inside the Dockerfile, but this appears to have broken when removing the use of buildx in v0.5.2. Thus, I moved this logic directly into Python in the dev version.

I have also "yanked" v0.5.2 on pypi (meaning marking it as faulty). With this, right now, new users should not run into this issue anymore as v0.5.1 is considered the latest version again.

Note: The reason the fix took me so long is because I ran into another unrelated bug in the GitHub CI related to the use of Beobench inside docker-in-docker containers. I think this is unlikely, but if you're using this kind of setup (dind) then have look at #85.

EDIT: There was a problem with the internal Beobench version checking due to marking the pypi package as yanked, apologies for that! This is now fixed.

david-woelfle commented 2 years ago

Thanks again @HYDesmondLiu and @david-woelfle for finding this error! Thanks for your patience!

I have now updated the development version of Beobench with a fix for this issue. You can install the latest development version using the command:

pip install git+https://github.com/rdnfn/beobench.git@dev/general

Hi @rdnfn and thank you for working on this issue. I have tried the command above but it didn't work. What did work was installing beobench in version 0.5.1.

HYDesmondLiu commented 2 years ago

Hi @rdnfn, Thanks for the quick fix. this version works for me. However, are you planning to add more content to advanced usage page? I have some question about advanced usage, for example:

  1. I tried to invoke my own algorithm in the agent script, however, it cannot recognize modules in the same directory.
  2. Also, the first yaml configuration in the page with Energym does not work. If I run the yaml, it shows error messages as this: Screen Shot 2022-07-07 at 10 07 22 PM

    If I use pdb to debug it shows a similar error as the original bug reported.

    Screen Shot 2022-07-07 at 10 08 36 PM

    Please let me know if I should open a new issue or we could discuss this issue here, thanks.

rdnfn commented 2 years ago

Thanks for testing this @HYDesmondLiu! Glad to hear this works now.

About your other points:

However, are you planning to add more content to advanced usage page?

Absolutely, this is work in progress!

  1. I tried to invoke my own algorithm in the agent script, however, it cannot recognize modules in the same directory.

This is unfortunately not supported yet, but I am hoping to add this feature soon. #70 tracks the mounting of the directory of the agent script (e.g. access to files in the same folder as the agent script and its subfolders). #71 tracks the installation of agent script specific external pypi dependencies (e.g. pip install packagename).

  1. Also, the first yaml configuration in the page with Energym does not work. If I run the yaml, it shows error messages as this:

Thanks for flagging this. This error is likely because of an env.config setting. Would you be able to open another issue for this problem with a copy of the exact YAML file that you used?

With that I will close this specific issue now, the changes will be included in the next release (v0.5.3). Feel free to open a new issues for any other problems or questions you run into!

HYDesmondLiu commented 2 years ago

@rdnfn Thanks for the detailed responses. I am glad these are considered in recent revision soon. Look forward to the revision.