twosixlabs / armory

ARMORY Adversarial Robustness Evaluation Test Bed
MIT License
174 stars 67 forks source link

Problems with GPU mode #482

Closed ambarpal closed 4 years ago

ambarpal commented 4 years ago

I am unable to make armory work in gpu mode on a system with CUDA 10.0, Python 3.6, using the twosixarmory/pytorch:0.6.0 docker container. I get no error when running the ucf101_baseline_pretrained config with the default settings, but changing to the gpu flag leads to the following error trace:

2020-04-22 15:18:36 io51 armory.paths[20629] INFO Creating armory directories if they do not exist
2020-04-22 15:18:37 io51 root[20629] INFO Downloading external repo: yusong-tan/MARS
2020-04-22 15:18:37 io51 armory.eval.evaluator[20629] ERROR Starting instance failed.
Traceback (most recent call last):
  File "/cis/home/ambar/.local/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status
    response.raise_for_status()
  File "/cis/home/ambar/.local/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.40/containers/3f5bca271b7b2de197f2cad179178f8e836ffe0584949eb61dda45aef3d0d52b/start

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/cis/home/ambar/anaconda3/envs/gard-py3.6/lib/python3.6/site-packages/armory/eval/evaluator.py", line 154, in run
    container_subdir=self.container_subdir,
  File "/cis/home/ambar/anaconda3/envs/gard-py3.6/lib/python3.6/site-packages/armory/docker/management.py", line 108, in start_armory_instance
    container_subdir=container_subdir,
  File "/cis/home/ambar/anaconda3/envs/gard-py3.6/lib/python3.6/site-packages/armory/docker/management.py", line 71, in __init__
    image_name, **container_args
  File "/cis/home/ambar/.local/lib/python3.6/site-packages/docker/models/containers.py", line 809, in run
    container.start()
  File "/cis/home/ambar/.local/lib/python3.6/site-packages/docker/models/containers.py", line 400, in start
    return self.client.api.start(self.id, **kwargs)
  File "/cis/home/ambar/.local/lib/python3.6/site-packages/docker/utils/decorators.py", line 19, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/cis/home/ambar/.local/lib/python3.6/site-packages/docker/api/container.py", line 1095, in start
    self._raise_for_status(res)
  File "/cis/home/ambar/.local/lib/python3.6/site-packages/docker/api/client.py", line 263, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/cis/home/ambar/.local/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 500 Server Error: Internal Server Error ("OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/3f5bca271b7b2de197f2cad179178f8e836ffe0584949eb61dda45aef3d0d52b/log.json: no such file or directory): exec: "nvidia-container-runtime": executable file not found in $PATH: unknown")
2020-04-22 15:18:38 io51 armory.eval.evaluator[20629] ERROR Is Docker Daemon running?
2020-04-22 15:18:38 io51 armory.eval.evaluator[20629] INFO Deleting tmp_dir /cis/home/ambar/my_documents/docker-data/video-defense/gard-testbed/tmp/2020-04-22T19-18-36.880104
2020-04-22 15:18:38 io51 armory.eval.evaluator[20629] INFO Removing output_dir /cis/home/ambar/my_documents/docker-data/video-defense/gard-testbed/outputs/2020-04-22T19-18-36.880104 if empty

armory seems to be using nvidia-container-runtime (related issue elsewhere) which is not available. Is this package a requirement for armory? Any help is greatly appreciated!

seanpmorgan commented 4 years ago

Hi @ambarpal. Yes nvidia-docker runtime is a requirement to use GPU docker containers. See this issue for more information: https://github.com/twosixlabs/armory/issues/157

In short it shouldn't be required since Docker 19+ has GPU support by default, but the docker-py SDK has yet to merge support for it. There should be instructions in that issue to get it working.

seanpmorgan commented 4 years ago

Closing as this is a duplicate of #157. Please feel free to re-open / continue discussion if that does not solve your issue. Thanks!