nickjj / ansible-docker

Install / Configure Docker and Docker Compose using Ansible.
MIT License
750 stars 224 forks source link

Expected behavior for DockerHub private repositories login options? #105

Closed mkarlesky closed 3 years ago

mkarlesky commented 3 years ago

I'm not seeing the expected behavior for a Docker registry configuration, but I'm unsure of what the role's behavior should be and of my configuration of the role.

I've interpreted the role's login and registry features to leave the remote system in a state ready to execute docker pull through, for instance, subsequent use of Ansible's docker_container for the configured repository. In my case I want to pull private images from DockerHub.

The following configuration for the role executes successfully, and I find a .docker/config.json file in the sudo user's home directory with an authorization token within it. I think this means that everything is configured properly. However, when I try to use private Docker images in playbook runs following the role execution I receive errors when accessing private images.

Everything works fine if I add a docker_login with the same credentials cited below before docker_conainer usage in my playbook. Curiously, after using docker_login in this way I see .docker/config.json in the sudo user's home directory reformatted slightly but with the same authorization token. It seems it's rewritten with the same information.

Am I misunderstanding what the variables below accomplish? Am I using them incorrectly to accomplish what I'm after?

What information can I provide to troubleshoot this issue?

Thank you for any help.

  vars:
    - docker__users: ["{{ ansible_env.SUDO_USER }}"]
    - docker__login_become_user: "{{ docker__users | first }}"
    - docker__registries:
      - registry_url: "https://index.docker.io/v1/"
      - username: "<snip>"
        password: "<secret>"
        reauthorize: true
        state: "present"
nickjj commented 3 years ago

Hi,

What happens if you don't set reauthorize: true?

Also technically the registry_url and state: "present" don't need to be set since those are the default values.

mkarlesky commented 3 years ago

Thank you so much for the quick reply.

The configuration blurb above is the most complete set of options I've tried. I've seen the same behavior without registry_url and without reauthorize or state.

Until this experience I've not tried to understand the Docker login process in any depth. Am I correct in understanding that the Docker login process generates the following in config.json and that this token is what is used to access a private Docker registry (DockerHub in this case)? Is there more involved? If not, then the role certainly seems to be doing its job. After it runs I see a .docker/config.json file such as the following in the executing user's home directory.

{
    "auths": {
        "https://index.docker.io/v1/": {
            "auth": "<token>"
        }
    }
}

If the file above is pretty much the end of it, then perhaps I have a user problem between executing the role and subsequent playbooks? That is, maybe my playbooks with docker_container references are executing with a different user than the role? Both playbooks are running with the same inventories and become: true, so I'm not sure what to think about that.

  vars:
    - ansible_python_interpreter: auto
    - docker__users: ["{{ ansible_env.SUDO_USER }}"]
    - docker__login_become_user: "{{ docker__users | first }}"
    - docker__registries:
      - username: "<snip>"
        password: "<secret>"

I just tried everything again from scratch with a clean Vagrant instance (and trimmed down role vars as above). After executing the role, I confirmed that /home/vagrant/.docker/config.json exists with content as above. After this when I run a second playbook that pulls a private image from DockerHub, I see the following error:

FAILED! => {"changed": false, "msg": "Error pulling image <snip> - 404 Client Error for http+docker://localhost/v1.41/images/create?tag=1.1&fromImage=<snip>: Not Found (\"pull access denied for <image>, repository does not exist or may require 'docker login': denied: requested access to the resource is denied\")"}

What I see in Vagrant I also see on a remote production system as well. The behavior is identical.

One last little detail… If I ssh into the Vagrantbox after the failed playbook execution and run sudo docker login from within the vagrant user's home directory, I see this: “Authenticating with existing credentials...”. The login process does not ask for a username or password, and, afterwards the content of config.json is identical but reformatted with extra whitespace. I can then manually pull private images without issue. However, my playbook with docker_container calls still fails to pull private images. I'm forced to add docker_login in my playbook before docker_container calls involving the private images.

Any ideas? Thank you again.

nickjj commented 3 years ago

Am I correct in understanding that the Docker login process generates the following in config.json and that this token is what is used to access a private Docker registry (DockerHub in this case)?

Yeah, once that file is in place you should be able to log into your server and then docker pull a private image for the account that the token was generated for.

You can test if you're logged in correctly by SSHing into your server and running: docker pull mkarlesky/foo. If you're logged in correctly it will say that image is not found. If you're not logged in correctly it will say pull access is denied. That's a quick way to make sure things are working without having to push a real image to your private account.

Before going over the rest of your comment, that config file is going to end up being saved into your Docker user's home directory. That means only your Docker user will be able to log in using those credentials.

If you run your other playbook as root, the Docker client will look for the root user's credentials which won't be found and then you'll get denied.

Your custom role or playbook will need to execute any Docker related tasks as that Docker user you used during the set up of this role and if you plan to use the docker_* modules from Ansible, you'll need to make sure this is set up: https://github.com/nickjj/ansible-docker#working-with-ansibles-docker_-modules

mkarlesky commented 3 years ago

Ah ha. Alas, I am already setting ansible_python_interpreter in the vars section of the playbook that calls docker_container references. Does the following seem correct?

  vars:
    ansible_python_interpreter: "{{ '/usr/bin/env python3-docker' }}"

I'm wondering if there's something off with the permissions, owner, or group for /home/vagrant/.docker/config.json. When at the command line within the Vagrant VM, I'm forced to use sudo with some docker commands to access config.json even though the vagrant user executing those commands is in the docker group.

vagrant@ubuntu-bionic:~$ groups vagrant
vagrant : vagrant docker
vagrant@ubuntu-bionic:~$ ls -al .docker
total 12
drwxr-xr-x 2 root    root    4096 Mar  2 15:16 .
drwxr-xr-x 7 vagrant vagrant 4096 Mar  2 15:16 ..
-rw------- 1 root    root     170 Mar  2 15:16 config.json

Should config.json exist with root:root ownership? Perhaps I'm not configuring Docker users correctly or executing the playbook that uses the nickjj.docker role as the wrong user? I've verified that the docker_container tasks are executing as the vagrant user.

Thank you again for the help. I think we're getting close.

nickjj commented 3 years ago

Hmm.

Version 2.1.0 released last month should have fixed the file permission of the config.json file to be owned by the correct user. The commit for that is here: https://github.com/nickjj/ansible-docker/commit/c8c090137779a915817143be1932f45321bafda6

And since you mentioned using docker__login_become_user I'm guessing you are using v2.1.0 of this role? Did you maybe run the role without setting docker__login_become_user first and now it can't update the file?

mkarlesky commented 3 years ago

According to the CHANGELOG I'm using v2.0.0 of the role. Is v2.1.0 the magic here?

The complete configuration for the role is below. I've destroyed and re-upped Vagrant several times now. I'm confident docker__login_become_user has been used in each experiment.

- hosts: all
  become: true

  roles:
    - nickjj.docker

  vars:
    - docker__users: ["{{ ansible_env.SUDO_USER }}"]
    - docker__login_become_user: "{{ docker__users | first }}"
    - docker__registries:
      - username: "<snip>"
        password: "<secret>"
    - docker__default_daemon_json: |
        "log-driver": "json-file",
        "log-opts": {
          "max-size": "10m",
          "max-file": "25"
        }
    - docker__default_pip_packages:
      - name: docker-compose
        state: absent
      - name: docker
        state: present
mkarlesky commented 3 years ago

I updated to v2.1.0 of the role and executed it against a clean Vagrant box with the config in the previous comment.

Docker installation now fails with: TASK [nickjj.docker : Manage Docker registry login credentials] ****************************** failed: [192.168.33.11] (item={'username': '<snip>', 'password': '<secret>'}) => {"ansible_loop_var": "item", "changed": false, "item": {"password": "<secret>", "username": "<snip>"}, "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))"}

Am I missing a necessary configuration option?

nickjj commented 3 years ago

That's probably because you have a file owned by root and now it's properly trying to use the correct user and it can't write over the file.

If you delete that ~/.docker directory and let the role make a new directory + config file you should be good to go with v2.1.0.

mkarlesky commented 3 years ago

Apologies. I neglected to add that the run of the v2.1.0 role was against a fresh Vagrant VM. The failure during installation was the first execution of Docker installation through the role on that VM. The failure occurs during the registry login credentials set up step.

nickjj commented 3 years ago

Are you sure because you did mention in your previous comment you had an existing config file there before you edited the message.

mkarlesky commented 3 years ago

I'm sure. I confirmed the absence of /home/vagrant/.docker after recreating the VM and before executing the first playbooks to configure the clean instance of the VM server.

I found the problem. It's something to do with the user executing the playbook.

If I run the playbook (config above) that uses the nickjj.docker role three times with tweaks between runs I can successfully configure the machine for Docker usage and run private Docker images.

  1. Run the playbook as shown above with become: true so that Docker dependencies can be installed. The role fails when it reaches the point of trying to interact with the Docker API. config.json does not exist in /home/vagrant because the Docker login step did not work correctly when the role executed with become: true.
  2. Remove become: true from the playbook, and run it again. This time the Docker dependencies are already installed, and the Docker login credentials are created in /home/vagrant/.docker/config.json with vagrant:vagrant ownership and vagrant a member of the docker group. Role execution fails when trying to install the cleanup cron job.
  3. Add become: true back in and run the playbook a final time. The cron job is installed and the role completes its execution.

After executing the above, I'm able to run a subsequent playbook that makes use of private Docker images hosted at DockerHub.

To summarize:

  1. The original problem was due to my use of v2.0.0 of the role. Upgrading to v2.1.0 solved that essential problem.
  2. Something in my configuration of users or within the role itself is not properly “hopping” among privileges/users to successfully execute the role's steps.

Thank you again.

nickjj commented 3 years ago

You should always be running this role with become: true btw. There's specific logic happening in the individual task that needs to be executed as another user. For example the docker_login task which sets your credentials gets run with become_user https://github.com/nickjj/ansible-docker/blob/master/tasks/main.yml#L148.

Everything else gets run as root but that involves installing packages and adjusting system config files, etc..

On a completely fresh machine with no other credentials created what happens if you do this:

  1. Run a playbook that only runs this role with become: true
  2. Run your other playbook that runs various docker_* tasks separately (a totally separate ansible-playbook call)

Also please post your exact playbook files for both cases here.

mkarlesky commented 3 years ago

I've been executing two separate Docker playbooks all along. Summary follows. File contents are at bottom.

Thank you so much for all the help. I'm sure this has been a tedious issue thread.

  1. The first playbook is dedicated to installing and configuring Docker itself using the nickjj.docker role.
  2. The second playbook copies over files, sets up a Docker network, and spins up a variety of Docker containers including a handful of private images from DockerHub.
---

##
## Install & configure Docker 
##

- hosts: all
  become: true

  roles:
    - nickjj.docker

  vars:
    - docker__users: ["{{ ansible_env.SUDO_USER }}"]
    - docker__login_become_user: "{{ docker__users | first }}"
    - docker__registries:
      - username: "<snip>"
        password: "<secret>"
    - docker__default_daemon_json: |
        "log-driver": "json-file",
        "log-opts": {
          "max-size": "10m",
          "max-file": "25"
        }
    - docker__default_pip_packages:
      # Don't install docker-compose
      - name: docker-compose
        state: absent
      # Overriding the defaults to omit the preceding requires this be explicit
      - name: docker
        state: present

  tasks:
    - name: Ensure docker is running and starts at boot.
      service:
        name: docker
        state: started
        enabled: true
---

- hosts: all

  vars_files:
    - ../../../vars.yml
    - ../vars.yml

  vars:
    mqtt_network: mqtt
    mqtt_broker_name: mosquitto
    # Python virtual environment config to work with Docker setup via nickjj.docker role 
    ansible_python_interpreter: "{{ '/usr/bin/env python3-docker' }}"

  tasks:
    - name: Build list of all running Docker containers
      docker_host_info:
        containers: True
      register: docker_info

    ## Config file updates
    - name: Remove Docker volumes config files
      become: true
      file:
        path: "{{ container_volumes_basepath }}/config/"
        state: absent

    - name: Copy Docker volume config files
      become: true
      copy:
        dest: "{{ container_volumes_basepath }}/config/"
        src: "files/volumes/config/"

    ## Docker networks
    - name: Create MQTT Docker network
      docker_network:
        name: "{{ mqtt_network }}"

    ## MQTT Broker Container
    - name: Add & run Mosquitto MQTT Broker container
      docker_container:
        container_default_behavior: no_defaults
        name: "{{ mqtt_broker_name }}"
        image: eclipse-mosquitto:1.4.12
        command: "mosquitto --verbose"
        ports:
          - "1883:1883"
        networks:
          - name: "{{ mqtt_network }}"
        networks_cli_compatible: true
        state: started
        restart_policy: always
        detach: true

    # <snip several more docker_container tasks>
nickjj commented 3 years ago

And for clarity after running the first playbook, it errors out when it hits the task that writes the config?

What version of Ansible are you using? What exact OS / distro versions are you running it against too?

mkarlesky commented 3 years ago

My host is Ansible 2.10.3 (Python 3.9.0) on macOS 10.15.7.

The remote system I'm developing with is Ubuntu 18.04.5 running in Virtual Box via Vagrant. (The intended production system is also Ubuntu 18.04.5.)

Yes. The playbook errors out at the registry login step (see below).


I just vagrant destroyed the test environment and spun up a clean VM. I used a simple Ansible playbook to install aptitude, chrony, and Python 3.

When I run the Docker installation playbook as shown in my previous entry in this thread (## Install & configure Docker) I see this at the end of the run:

TASK [nickjj.docker : Manage Docker registry login credentials] ************************************************************************************************************************
failed: [192.168.33.11] (item={'username': '<snip>', 'password': '<secret>'}) => {"ansible_loop_var": "item", "changed": false, "item": {"password": "<secret>", "username": "<snip>"}, "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))"}

PLAY RECAP *****************************************************************************************************************************************************************************
192.168.33.11              : ok=14   changed=11   unreachable=0    failed=1    skipped=4    rescued=0    ignored=0

If I run the same playbook again but make the following two changes then the registry login step succeeds but the playbook fails when attempting to set up the cron job due to insufficient privileges.

  1. Remove become: true
  2. Modify the vars to use docker__users: ["vagrant"] instead of docker__users: ["{{ ansible_env.SUDO_USER }}"]
nickjj commented 3 years ago

Which Vagrant box are you using?

Have you tried it on a non-Vagrant server?

Have you tried creating a regular user instead of vagrant and use a proper SSH key to connect?

Does the Vagrant user normally have passwordless sudo enabled?

mkarlesky commented 3 years ago

I finally had a chance to try setting up a new user with a proper SSH key and passwordless sudo instead of the vagrant user on a new, local vagrant box. This worked! Version 2.1.0 of the role worked in installing and configuring Docker as desired. I was then able to pull private Docker images and run them using a second playbook without issue.

I did need to make one change. Instead of docker__users: ["{{ ansible_env.SUDO_USER }}"] I had to use docker__users: ["{{ <new_user> }}"] where <new_user> is the new passwordless sudo user I set up on the clean vagrant box for this test. I believe I got the original line from an example configuration for the nickjj.docker role. For some reason things only worked by listing the user specifically.

nickjj commented 3 years ago

Thanks for reporting back.

Does your SUDO_USER have passwordless sudo enabled btw? What did ansible_env.SUDO_USER evaluate to when you ran it?

nickjj commented 3 years ago

I'm going to close this one as it feels like it's related to Vagrant. Feel free to re-open it again if it ends up being something else. The documentation also suggests setting docker__users: ["{{ ansible_env.SUDO_USER | d('root') }}"] which does give you a fallback plan if SUDO_USER happens to be undefined.