Closed mkarlesky closed 3 years ago
Hi,
What happens if you don't set reauthorize: true
?
Also technically the registry_url
and state: "present"
don't need to be set since those are the default values.
Thank you so much for the quick reply.
The configuration blurb above is the most complete set of options I've tried. I've seen the same behavior without registry_url
and without reauthorize
or state
.
Until this experience I've not tried to understand the Docker login process in any depth. Am I correct in understanding that the Docker login process generates the following in config.json
and that this token is what is used to access a private Docker registry (DockerHub in this case)? Is there more involved? If not, then the role certainly seems to be doing its job. After it runs I see a .docker/config.json
file such as the following in the executing user's home directory.
{
"auths": {
"https://index.docker.io/v1/": {
"auth": "<token>"
}
}
}
If the file above is pretty much the end of it, then perhaps I have a user problem between executing the role and subsequent playbooks? That is, maybe my playbooks with docker_container
references are executing with a different user than the role? Both playbooks are running with the same inventories and become: true
, so I'm not sure what to think about that.
vars:
- ansible_python_interpreter: auto
- docker__users: ["{{ ansible_env.SUDO_USER }}"]
- docker__login_become_user: "{{ docker__users | first }}"
- docker__registries:
- username: "<snip>"
password: "<secret>"
I just tried everything again from scratch with a clean Vagrant instance (and trimmed down role vars as above). After executing the role, I confirmed that /home/vagrant/.docker/config.json
exists with content as above. After this when I run a second playbook that pulls a private image from DockerHub, I see the following error:
FAILED! => {"changed": false, "msg": "Error pulling image <snip> - 404 Client Error for http+docker://localhost/v1.41/images/create?tag=1.1&fromImage=<snip>: Not Found (\"pull access denied for <image>, repository does not exist or may require 'docker login': denied: requested access to the resource is denied\")"}
What I see in Vagrant I also see on a remote production system as well. The behavior is identical.
One last little detail… If I ssh
into the Vagrantbox after the failed playbook execution and run sudo docker login
from within the vagrant
user's home directory, I see this: “Authenticating with existing credentials...”. The login process does not ask for a username or password, and, afterwards the content of config.json
is identical but reformatted with extra whitespace. I can then manually pull private images without issue. However, my playbook with docker_container
calls still fails to pull private images. I'm forced to add docker_login
in my playbook before docker_container
calls involving the private images.
Any ideas? Thank you again.
Am I correct in understanding that the Docker login process generates the following in config.json and that this token is what is used to access a private Docker registry (DockerHub in this case)?
Yeah, once that file is in place you should be able to log into your server and then docker pull
a private image for the account that the token was generated for.
You can test if you're logged in correctly by SSHing into your server and running: docker pull mkarlesky/foo
. If you're logged in correctly it will say that image is not found. If you're not logged in correctly it will say pull access is denied. That's a quick way to make sure things are working without having to push a real image to your private account.
Before going over the rest of your comment, that config file is going to end up being saved into your Docker user's home directory. That means only your Docker user will be able to log in using those credentials.
If you run your other playbook as root, the Docker client will look for the root user's credentials which won't be found and then you'll get denied.
Your custom role or playbook will need to execute any Docker related tasks as that Docker user you used during the set up of this role and if you plan to use the docker_*
modules from Ansible, you'll need to make sure this is set up: https://github.com/nickjj/ansible-docker#working-with-ansibles-docker_-modules
Ah ha. Alas, I am already setting ansible_python_interpreter
in the vars section of the playbook that calls docker_container
references. Does the following seem correct?
vars:
ansible_python_interpreter: "{{ '/usr/bin/env python3-docker' }}"
I'm wondering if there's something off with the permissions, owner, or group for /home/vagrant/.docker/config.json
. When at the command line within the Vagrant VM, I'm forced to use sudo
with some docker
commands to access config.json
even though the vagrant
user executing those commands is in the docker
group.
vagrant@ubuntu-bionic:~$ groups vagrant
vagrant : vagrant docker
vagrant@ubuntu-bionic:~$ ls -al .docker
total 12
drwxr-xr-x 2 root root 4096 Mar 2 15:16 .
drwxr-xr-x 7 vagrant vagrant 4096 Mar 2 15:16 ..
-rw------- 1 root root 170 Mar 2 15:16 config.json
Should config.json
exist with root:root
ownership? Perhaps I'm not configuring Docker users correctly or executing the playbook that uses the nickjj.docker
role as the wrong user? I've verified that the docker_container
tasks are executing as the vagrant
user.
Thank you again for the help. I think we're getting close.
Hmm.
Version 2.1.0 released last month should have fixed the file permission of the config.json
file to be owned by the correct user. The commit for that is here: https://github.com/nickjj/ansible-docker/commit/c8c090137779a915817143be1932f45321bafda6
And since you mentioned using docker__login_become_user
I'm guessing you are using v2.1.0 of this role? Did you maybe run the role without setting docker__login_become_user
first and now it can't update the file?
According to the CHANGELOG I'm using v2.0.0 of the role. Is v2.1.0 the magic here?
The complete configuration for the role is below. I've destroyed and re-upped Vagrant several times now. I'm confident docker__login_become_user
has been used in each experiment.
- hosts: all
become: true
roles:
- nickjj.docker
vars:
- docker__users: ["{{ ansible_env.SUDO_USER }}"]
- docker__login_become_user: "{{ docker__users | first }}"
- docker__registries:
- username: "<snip>"
password: "<secret>"
- docker__default_daemon_json: |
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "25"
}
- docker__default_pip_packages:
- name: docker-compose
state: absent
- name: docker
state: present
I updated to v2.1.0 of the role and executed it against a clean Vagrant box with the config in the previous comment.
Docker installation now fails with:
TASK [nickjj.docker : Manage Docker registry login credentials] ****************************** failed: [192.168.33.11] (item={'username': '<snip>', 'password': '<secret>'}) => {"ansible_loop_var": "item", "changed": false, "item": {"password": "<secret>", "username": "<snip>"}, "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))"}
Am I missing a necessary configuration option?
That's probably because you have a file owned by root and now it's properly trying to use the correct user and it can't write over the file.
If you delete that ~/.docker
directory and let the role make a new directory + config file you should be good to go with v2.1.0.
Apologies. I neglected to add that the run of the v2.1.0 role was against a fresh Vagrant VM. The failure during installation was the first execution of Docker installation through the role on that VM. The failure occurs during the registry login credentials set up step.
Are you sure because you did mention in your previous comment you had an existing config file there before you edited the message.
I'm sure. I confirmed the absence of /home/vagrant/.docker
after recreating the VM and before executing the first playbooks to configure the clean instance of the VM server.
I found the problem. It's something to do with the user executing the playbook.
If I run the playbook (config above) that uses the nickjj.docker
role three times with tweaks between runs I can successfully configure the machine for Docker usage and run private Docker images.
become: true
so that Docker dependencies can be installed. The role fails when it reaches the point of trying to interact with the Docker API. config.json
does not exist in /home/vagrant
because the Docker login step did not work correctly when the role executed with become: true
.become: true
from the playbook, and run it again. This time the Docker dependencies are already installed, and the Docker login credentials are created in /home/vagrant/.docker/config.json
with vagrant:vagrant
ownership and vagrant
a member of the docker
group. Role execution fails when trying to install the cleanup cron job.become: true
back in and run the playbook a final time. The cron job is installed and the role completes its execution.After executing the above, I'm able to run a subsequent playbook that makes use of private Docker images hosted at DockerHub.
To summarize:
Thank you again.
You should always be running this role with become: true
btw. There's specific logic happening in the individual task that needs to be executed as another user. For example the docker_login
task which sets your credentials gets run with become_user
https://github.com/nickjj/ansible-docker/blob/master/tasks/main.yml#L148.
Everything else gets run as root but that involves installing packages and adjusting system config files, etc..
On a completely fresh machine with no other credentials created what happens if you do this:
become: true
docker_*
tasks separately (a totally separate ansible-playbook
call)Also please post your exact playbook files for both cases here.
I've been executing two separate Docker playbooks all along. Summary follows. File contents are at bottom.
Thank you so much for all the help. I'm sure this has been a tedious issue thread.
nickjj.docker
role.---
##
## Install & configure Docker
##
- hosts: all
become: true
roles:
- nickjj.docker
vars:
- docker__users: ["{{ ansible_env.SUDO_USER }}"]
- docker__login_become_user: "{{ docker__users | first }}"
- docker__registries:
- username: "<snip>"
password: "<secret>"
- docker__default_daemon_json: |
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "25"
}
- docker__default_pip_packages:
# Don't install docker-compose
- name: docker-compose
state: absent
# Overriding the defaults to omit the preceding requires this be explicit
- name: docker
state: present
tasks:
- name: Ensure docker is running and starts at boot.
service:
name: docker
state: started
enabled: true
---
- hosts: all
vars_files:
- ../../../vars.yml
- ../vars.yml
vars:
mqtt_network: mqtt
mqtt_broker_name: mosquitto
# Python virtual environment config to work with Docker setup via nickjj.docker role
ansible_python_interpreter: "{{ '/usr/bin/env python3-docker' }}"
tasks:
- name: Build list of all running Docker containers
docker_host_info:
containers: True
register: docker_info
## Config file updates
- name: Remove Docker volumes config files
become: true
file:
path: "{{ container_volumes_basepath }}/config/"
state: absent
- name: Copy Docker volume config files
become: true
copy:
dest: "{{ container_volumes_basepath }}/config/"
src: "files/volumes/config/"
## Docker networks
- name: Create MQTT Docker network
docker_network:
name: "{{ mqtt_network }}"
## MQTT Broker Container
- name: Add & run Mosquitto MQTT Broker container
docker_container:
container_default_behavior: no_defaults
name: "{{ mqtt_broker_name }}"
image: eclipse-mosquitto:1.4.12
command: "mosquitto --verbose"
ports:
- "1883:1883"
networks:
- name: "{{ mqtt_network }}"
networks_cli_compatible: true
state: started
restart_policy: always
detach: true
# <snip several more docker_container tasks>
And for clarity after running the first playbook, it errors out when it hits the task that writes the config?
What version of Ansible are you using? What exact OS / distro versions are you running it against too?
My host is Ansible 2.10.3 (Python 3.9.0) on macOS 10.15.7.
The remote system I'm developing with is Ubuntu 18.04.5 running in Virtual Box via Vagrant. (The intended production system is also Ubuntu 18.04.5.)
Yes. The playbook errors out at the registry login step (see below).
I just vagrant destroy
ed the test environment and spun up a clean VM. I used a simple Ansible playbook to install aptitude, chrony, and Python 3.
When I run the Docker installation playbook as shown in my previous entry in this thread (## Install & configure Docker
) I see this at the end of the run:
TASK [nickjj.docker : Manage Docker registry login credentials] ************************************************************************************************************************
failed: [192.168.33.11] (item={'username': '<snip>', 'password': '<secret>'}) => {"ansible_loop_var": "item", "changed": false, "item": {"password": "<secret>", "username": "<snip>"}, "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))"}
PLAY RECAP *****************************************************************************************************************************************************************************
192.168.33.11 : ok=14 changed=11 unreachable=0 failed=1 skipped=4 rescued=0 ignored=0
If I run the same playbook again but make the following two changes then the registry login step succeeds but the playbook fails when attempting to set up the cron job due to insufficient privileges.
become: true
docker__users: ["vagrant"]
instead of docker__users: ["{{ ansible_env.SUDO_USER }}"]
Which Vagrant box are you using?
Have you tried it on a non-Vagrant server?
Have you tried creating a regular user instead of vagrant and use a proper SSH key to connect?
Does the Vagrant user normally have passwordless sudo enabled?
I finally had a chance to try setting up a new user with a proper SSH key and passwordless sudo instead of the vagrant user on a new, local vagrant box. This worked! Version 2.1.0 of the role worked in installing and configuring Docker as desired. I was then able to pull private Docker images and run them using a second playbook without issue.
I did need to make one change. Instead of docker__users: ["{{ ansible_env.SUDO_USER }}"]
I had to use docker__users: ["{{ <new_user> }}"]
where <new_user>
is the new passwordless sudo user I set up on the clean vagrant box for this test. I believe I got the original line from an example configuration for the nickjj.docker
role. For some reason things only worked by listing the user specifically.
Thanks for reporting back.
Does your SUDO_USER have passwordless sudo enabled btw? What did ansible_env.SUDO_USER
evaluate to when you ran it?
I'm going to close this one as it feels like it's related to Vagrant. Feel free to re-open it again if it ends up being something else. The documentation also suggests setting docker__users: ["{{ ansible_env.SUDO_USER | d('root') }}"]
which does give you a fallback plan if SUDO_USER
happens to be undefined.
I'm not seeing the expected behavior for a Docker registry configuration, but I'm unsure of what the role's behavior should be and of my configuration of the role.
I've interpreted the role's login and registry features to leave the remote system in a state ready to execute
docker pull
through, for instance, subsequent use of Ansible'sdocker_container
for the configured repository. In my case I want to pull private images from DockerHub.The following configuration for the role executes successfully, and I find a
.docker/config.json
file in the sudo user's home directory with an authorization token within it. I think this means that everything is configured properly. However, when I try to use private Docker images in playbook runs following the role execution I receive errors when accessing private images.Everything works fine if I add a
docker_login
with the same credentials cited below beforedocker_conainer
usage in my playbook. Curiously, after usingdocker_login
in this way I see.docker/config.json
in the sudo user's home directory reformatted slightly but with the same authorization token. It seems it's rewritten with the same information.Am I misunderstanding what the variables below accomplish? Am I using them incorrectly to accomplish what I'm after?
What information can I provide to troubleshoot this issue?
Thank you for any help.