Open Payback159 opened 5 months ago
Looks like it's an bug in OpenSSH and I am not the only one with the problem. https://github.com/ansible/ansible/issues/81777
I don't think we would need to use a fact, a registered variable is available during the rest of the playbook runs, so we can just use the registered variable directly and avoid setting a fact.
Hello @VannTen ,
thanks for your feedback. At first I also thought that it is not necessary to set a fact and as a first approach I just tried to extend the task with the condition
...
when:
...
- docker_images is undefined
The task as a whole example then looked like this:
# The image_info_command depends on the Container Runtime and will output something like the following:
# nginx:1.15,gcr.io/google-containers/kube-proxy:v1.14.1,gcr.io/google-containers/kube-proxy@sha256:44af2833c6cbd9a7fc2e9d2f5244a39dfd2e31ad91bf9d4b7d810678db738ee9,gcr.io/google-containers/kube-apiserver:v1.14.1,etc...
- name: Check_pull_required | Generate a list of information about the images on a node # noqa command-instead-of-shell - image_info_command contains a pipe, therefore requiring shell
- name: Check_pull_required | Generate a list of information about the images on a node # noqa command-instead-of-shell - image_info_command contains a pipe, therefore requiring shell
shell: "{{ image_info_command }}"
register: docker_images
changed_when: false
check_mode: no
when: not download_always_pull
when:
- not download_always_pull
- docker_images is undefined
But this always leads me to the problem that in the first run (first image, not ansible run), the variable is set correctly. But the second run/image Ansible throws an error that stdout isn't defined.
When the tasks look like that:
_Short note: Here I have replaced docker_images
with node_images
, but it shouldn't really make a difference, I just thought if I change something here I could rename it to the generic name, since the task doesn't only work with docker images._
---
# The image_info_command depends on the Container Runtime and will output something like the following:
# nginx:1.15,gcr.io/google-containers/kube-proxy:v1.14.1,gcr.io/google-containers/kube-proxy@sha256:44af2833c6cbd9a7fc2e9d2f5244a39dfd2e31ad91bf9d4b7d810678db738ee9,gcr.io/google-containers/kube-apiserver:v1.14.1,etc...
- name: Check_pull_required | Generate a list of information about the images on a node # noqa command-instead-of-shell - image_info_command contains a pipe, therefore requiring shell
shell: "{{ image_info_command }}"
register: node_images
changed_when: false
check_mode: no
when:
- not download_always_pull
- node_images is undefined
- name: Check_pull_required | Set pull_required if the desired image is not yet loaded
set_fact:
pull_required: >-
{%- if image_reponame | regex_replace('^docker\.io/(library/)?', '') in node_images.stdout.split(',') %}false{%- else -%}true{%- endif -%}
when:
- not download_always_pull
- name: Check_pull_required | Check that the local digest sha256 corresponds to the given image tag
assert:
that: "{{ download.repo }}:{{ download.tag }} in node_images.stdout.split(',') }}"
when:
- not download_always_pull
- not pull_required
- pull_by_digest
tags:
- asserts
I am getting following error
# First Run/image is behaving like expected
TASK [download : Check_pull_required | Generate a list of information about the images on a node] ***
task path: /kubespray/roles/download/tasks/check_pull_required.yml:4
ok: [kubespray-2-24-1-50-beb0cf006-k8s-master-nf-1] => {"changed": false, "cmd": "/usr/local/bin/nerdctl -n k8s.io images --format '{{ .Repository }}:{{ .Tag }}' 2>/dev/null | grep -v ^:$ | tr '\n' ','", "delta": "0:00:00.026868", "end": "2024-04-22 11:34:16.897943", "msg": "", "rc": 0, "start": "2024-04-22 11:34:16.871075", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
ok: [kubespray-2-24-1-50-beb0cf006-k8s-master-nf-2] => {"changed": false, "cmd": "/usr/local/bin/nerdctl -n k8s.io images --format '{{ .Repository }}:{{ .Tag }}' 2>/dev/null | grep -v ^:$ | tr '\n' ','", "delta": "0:00:00.027118", "end": "2024-04-22 11:34:16.923514", "msg": "", "rc": 0, "start": "2024-04-22 11:34:16.896396", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
ok: [kubespray-2-24-1-50-beb0cf006-k8s-node-nf-1] => {"changed": false, "cmd": "/usr/local/bin/nerdctl -n k8s.io images --format '{{ .Repository }}:{{ .Tag }}' 2>/dev/null | grep -v ^:$ | tr '\n' ','", "delta": "0:00:00.028809", "end": "2024-04-22 11:34:16.976791", "msg": "", "rc": 0, "start": "2024-04-22 11:34:16.947982", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
ok: [kubespray-2-24-1-50-beb0cf006-k8s-master-nf-3] => {"changed": false, "cmd": "/usr/local/bin/nerdctl -n k8s.io images --format '{{ .Repository }}:{{ .Tag }}' 2>/dev/null | grep -v ^:$ | tr '\n' ','", "delta": "0:00:00.026440", "end": "2024-04-22 11:34:16.973733", "msg": "", "rc": 0, "start": "2024-04-22 11:34:16.947293", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
ok: [kubespray-2-24-1-50-beb0cf006-k8s-node-nf-2] => {"changed": false, "cmd": "/usr/local/bin/nerdctl -n k8s.io images --format '{{ .Repository }}:{{ .Tag }}' 2>/dev/null | grep -v ^:$ | tr '\n' ','", "delta": "0:00:00.028615", "end": "2024-04-22 11:34:17.031499", "msg": "", "rc": 0, "start": "2024-04-22 11:34:17.002884", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
Monday 22 April 2024 11:33:34 +0000 (0:00:00.374) 0:05:45.887 **********
TASK [download : Check_pull_required | Set pull_required if the desired image is not yet loaded] ***
task path: /kubespray/roles/download/tasks/check_pull_required.yml:20
ok: [kubespray-2-24-1-50-beb0cf006-k8s-master-nf-1] => {"ansible_facts": {"pull_required": true}, "changed": false}
ok: [kubespray-2-24-1-50-beb0cf006-k8s-master-nf-2] => {"ansible_facts": {"pull_required": true}, "changed": false}
ok: [kubespray-2-24-1-50-beb0cf006-k8s-master-nf-3] => {"ansible_facts": {"pull_required": true}, "changed": false}
ok: [kubespray-2-24-1-50-beb0cf006-k8s-node-nf-1] => {"ansible_facts": {"pull_required": true}, "changed": false}
ok: [kubespray-2-24-1-50-beb0cf006-k8s-node-nf-2] => {"ansible_facts": {"pull_required": true}, "changed": false}
Monday 22 April 2024 11:33:35 +0000 (0:00:00.190) 0:05:46.078 **********
Monday 22 April 2024 11:33:35 +0000 (0:00:00.142) 0:05:46.220 **********
...
# Second image fails
TASK [download : Download_container | Prepare container download] **************
task path: /kubespray/roles/download/tasks/download_container.yml:18
included: /kubespray/roles/download/tasks/check_pull_required.yml for kubespray-2-24-1-50-beb0cf006-k8s-master-nf-1, kubespray-2-24-1-50-beb0cf006-k8s-master-nf-2, kubespray-2-24-1-50-beb0cf006-k8s-master-nf-3, kubespray-2-24-1-50-beb0cf006-k8s-node-nf-1, kubespray-2-24-1-50-beb0cf006-k8s-node-nf-2
Monday 22 April 2024 11:33:49 +0000 (0:00:00.240) 0:06:00.507 **********
Monday 22 April 2024 11:33:49 +0000 (0:00:00.151) 0:06:00.659 **********
TASK [download : Check_pull_required | Set pull_required if the desired image is not yet loaded] ***
task path: /kubespray/roles/download/tasks/check_pull_required.yml:20
fatal: [kubespray-2-24-1-50-beb0cf006-k8s-master-nf-1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '/kubespray/roles/download/tasks/check_pull_required.yml': line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Check_pull_required | Set pull_required if the desired image is not yet loaded\n ^ here\n"}
fatal: [kubespray-2-24-1-50-beb0cf006-k8s-master-nf-2]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '/kubespray/roles/download/tasks/check_pull_required.yml': line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Check_pull_required | Set pull_required if the desired image is not yet loaded\n ^ here\n"}
fatal: [kubespray-2-24-1-50-beb0cf006-k8s-master-nf-3]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '/kubespray/roles/download/tasks/check_pull_required.yml': line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Check_pull_required | Set pull_required if the desired image is not yet loaded\n ^ here\n"}
fatal: [kubespray-2-24-1-50-beb0cf006-k8s-node-nf-1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '/kubespray/roles/download/tasks/check_pull_required.yml': line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Check_pull_required | Set pull_required if the desired image is not yet loaded\n ^ here\n"}
fatal: [kubespray-2-24-1-50-beb0cf006-k8s-node-nf-2]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'\n\nThe error appears to be in '/kubespray/roles/download/tasks/check_pull_required.yml': line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Check_pull_required | Set pull_required if the desired image is not yet loaded\n ^ here\n"}
If I interpret the error message correctly, it is because the variable node_images does not remain set, contrary to the assumption.
Unfortunately I can't explain 100% why it doesn't stay set (since I originally made the same assumption as you) but maybe you can think of something else.
Thank you and kind regards! :-)
If I interpret the error message correctly, it is because the variable node_images does not remain set, contrary to the assumption.
Unfortunately I can't explain 100% why it doesn't stay set (since I originally made the same assumption as you) but maybe you can think of something else.
AFAICT from your logs, the variable is set but does not contains a stdout
like it should it if was the result of the command.
I see you use conditions depending on whether the variable is undefined, so maybe the following is happening :
result is skipped
in Ansible with thisstdout
anymore, but it's still defined.Does it match what you're seeing ? You can probably insert debug
tasks to see what's in the registered variable at various stages.
IMO, the simple solution (note that I didn't say the easy one ^) would be to have that tasks run only once, without the condition. I don't have the time to dig into it right now, but that should be doable.
Hello @VannTen ,
thanks for the hint, I didn't even think of that. I have now added a debug step and in the 2nd run (where it breaks) I get the following message:
TASK [download : Debug | Print node_images.stdout] *****************************
task path: /kubespray/roles/download/tasks/check_pull_required.yml:13
ok: [kubespray-2-24-1-53-d9473e943-k8s-master-nf-1] => {
"node_images.stdout": "VARIABLE IS NOT DEFINED!: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'"
}
ok: [kubespray-2-24-1-53-d9473e943-k8s-master-nf-2] => {
"node_images.stdout": "VARIABLE IS NOT DEFINED!: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'"
}
ok: [kubespray-2-24-1-53-d9473e943-k8s-master-nf-3] => {
"node_images.stdout": "VARIABLE IS NOT DEFINED!: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'"
}
ok: [kubespray-2-24-1-53-d9473e943-k8s-node-nf-1] => {
"node_images.stdout": "VARIABLE IS NOT DEFINED!: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'"
}
ok: [kubespray-2-24-1-53-d9473e943-k8s-node-nf-2] => {
"node_images.stdout": "VARIABLE IS NOT DEFINED!: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'"
}
It seems the node_images variable is not preserved in the whole ansible run, because the file check_pull_required.yml
are included dynamically via include_tasks
in the download_container.yml file?
I have now dragged the tasks for creating the node image lists into download_container.yml and passed node_images.stdout
as variable to check_pull_required.yml. This way node_images is preserved throughout the entire Ansible run and I can reuse the list for all images.
download_container.yml
...
# The image_info_command depends on the Container Runtime and will output something like the following:
# nginx:1.15,gcr.io/google-containers/kube-proxy:v1.14.1,gcr.io/google-containers/kube-proxy@sha256:44af2833c6cbd9a7fc2e9d2f5244a39dfd2e31ad91bf9d4b7d810678db738ee9,gcr.io/google-containers/kube-apiserver:v1.14.1,etc...
- name: Download_container | Generate a list of information about the images on a node # noqa command-instead-of-shell - image_info_command contains a pipe, therefore requiring shell
shell: "{{ image_info_command }}"
register: node_images
changed_when: false
check_mode: no
when:
- not download_always_pull
- node_images is undefined
- name: Download_container | Prepare container download
include_tasks: check_pull_required.yml
when:
- not download_always_pull
vars:
node_images_stdout: "{{ node_images.stdout }}"
...
check_pull_required.yml
---
- name: Check_pull_required | Set pull_required if the desired image is not yet loaded
set_fact:
pull_required: >-
{%- if image_reponame | regex_replace('^docker\.io/(library/)?', '') in node_images_stdout.split(',') %}false{%- else -%}true{%- endif -%}
when:
- not download_always_pull
- node_images.stdout is defined
- name: Check_pull_required | Check that the local digest sha256 corresponds to the given image tag
assert:
that: "{{ download.repo }}:{{ download.tag }} in node_images_stdout.split(',') }}"
when:
- not download_always_pull
- not pull_required
- pull_by_digest
- node_images.stdout is defined
tags:
- asserts
I am also running an upgrade cluster to check the logic with a "filled" container runtime. So far I have only tried it with cluster.yml (i.e. always fresh clusters) and the list is of course empty, let's see if it also fills successfully when the container runtime has already been used.
I don't see why it should behave differently, but better tested than wrongly assumed.
It seems the node_images variable is not preserved in the whole ansible run, because the file
check_pull_required.yml
are included dynamically viainclude_tasks
in the download_container.yml file?
Aren't you trying to print node_images.stdout rather than nodes_images ? To me the error message looks like node_images is in fact defined, but it does not have a key stdout
. Otherwise, where would that 'dict object' come from ?
ok: [kubespray-2-24-1-53-d9473e943-k8s-master-nf-1] => { "node_images.stdout": "VARIABLE IS NOT DEFINED!: 'dict object' has no attribute 'stdout'. 'dict object' has no attribute 'stdout'" }
If that's the case I don't quite understand how Ansible handles variable registration. Since the first run can access node_images.stdout but the 2nd breaks with the fact that node_images has no stdout.
I can still try if I can generally access node_images as dict_object and not node_images.stdout. So I wouldn't have to check the node_images in download_container.yml and pass them as variables to the include_tasks of check_pull_required.yml.
I tried to print node_images.stdout because the original tasks always went to stdout and I wanted to make as few changes as possible.
But if it makes more sense to leave the variable handling as close as possible to the tasks, I can still try to access node_images instead of node_images.stdout. But for this I have to check whether those check still fit.
- name: Check_pull_required | Set pull_required if the desired image is not yet loaded
set_fact:
pull_required: >-
{%- if image_reponame | regex_replace('^docker\.io/(library/)?', '') in node_images_stdout.split(',') %}false{%- else -%}true{%- endif -%}
when:
- not download_always_pull
- node_images.stdout is defined
- name: Check_pull_required | Check that the local digest sha256 corresponds to the given image tag
assert:
that: "{{ download.repo }}:{{ download.tag }} in node_images_stdout.split(',') }}"
when:
- not download_always_pull
- not pull_required
- pull_by_digest
- node_images.stdout is defined
tags:
- asserts
If that's the case I don't quite understand how Ansible handles variable registration. Since the first run can access node_images.stdout but the 2nd breaks with the fact that node_images has no stdout.
As I said before, I believe it's overridden by register
when the task is skipped (a skipped task still has a result which you can register). You should print node_images
with debug to see what's inside.
I have now created a new branch and only added the debug step to output node_images and get the following result in the 2nd run
check_pull_required.yml
- name: Check_pull_required | Generate a list of information about the images on a node # noqa command-instead-of-shell - image_info_command contains a pipe, therefore requiring shell
shell: "{{ image_info_command }}"
register: node_images
changed_when: false
check_mode: no
when:
- not download_always_pull
- node_images is undefined
- name: Check_pull_required | Print the list of images on the node
debug:
var: node_images
when:
- not download_always_pull
Output:
TASK [download : Check_pull_required | Print the list of images on the node] ***
task path: /kubespray/roles/download/tasks/check_pull_required.yml:13
ok: [kubespray-2-24-1-checkpullrequired-2-2bb8cb34e-k8s-master-nf-1] => {
"node_images": {
"changed": false,
"false_condition": "node_images is undefined",
"skip_reason": "Conditional result was False",
"skipped": true
}
}
It can still process the first image because "Check_pull_required | Set pull_required if the desired image is not yet loaded" is executed. With the 2nd image, the task is no longer executed and access to node_images fails.
I think that because of the include_tasks expression, the variable only exists while check_pull_required.yml is running and as soon as Ansible switches back to download_container.yml, the variable is cleaned up.
OK, my last sentences don't make sense, because if the variable node_images is undefined, the function "Check_pull_required | Set pull_required if the desired image is not yet loaded" would be executed also in the 2nd run.
OK, I think I've understood you now. I thought the register is only executed if the task is also executed, but register is always executed by Ansible, regardless of whether the when condition applies or not and that explains the behavior.
But then I only have the option of generating the image list in download_container.yml or alternatively using a facts variable, right?
You should pull out the image list generation from the "image loop" (download_container.yml is repeatedly included in download/main.yml) since there is really no reason we should do that. There is probably a way to do a "register just once", but it would be simpler to just not include it in the loop in the first place.
(The download role is a bit convoluted, to put it gently)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/reopen
@VannTen: Reopened this issue.
What would you like to be added
I would like to cache the list of node images through following extension:
git diff Output of roles/download/tasks/check_pull_required.yml
Why is this needed
I have several Kubernetes clusters with 60-90 nodes.
From this size, I notice that the upgrade-cluster.yml keeps aborting with the following error message and sometimes the upgrades would no longer work at all without separation with node limits. However, the error message does not always appear on the same node, but is randomly distributed across several nodes.
I have therefore checked how often and why the task is called.
In my case the task is called 20 times, but for my understanding it would be enough if the task is only called once for the whole run and the result can be cached. I don't think that the list changes during the Kubespray upgrade run in such a way that the changes would be relevant for the Kubespray run. Therefore, from my point of view, it makes no sense to fetch the list every time I have to download another image.
That means 20 x "node count" this error can occur and also executing 20 x the{{ image_info_command }} against the container runtime.
I'm currently testing the fix, but I wanted to get the opinions of the community in advance.