Open davordbetter opened 7 months ago
did you find any workaround for the same, getting same issue while running it on bunch of hosts having both arm64 and amd64 type archs
Hey @devmittal02, Haven't checked it out as we are building a new role for Grafana Agent which is for flow mode (recommended way now) so probably can test this out on that.
If you wanna double check, we have a PR open so I can get any changes you want in that right now.
My "workaround" is to group arm and amd VM in different groups and run 2 pipelines with interntory limit (-l)
This seems a very weird issue, @davordbetter any thoughts on why this is specially failing on GitLab?
@devmittal02 What platform are you running the playbook on?
Hey i think the issue is because of this run once, i am running on AWX to the entire fleet of ec2 machines, it spins up a on demand container and triggeres the playbook across the machines using SSM,
What's happening is lets say for 1st machine when it ran lets say that was AMD, so it downloaded the binary for that only and store in local, next time when ARM machine comes , it skips download step because of "run once" and copies only the previous AMD variant of binary, hence the issue of file doesn't exists, as it is a wrong binary
- name: Download Grafana Agent binary to controller (localhost)
block:
- name: Create Grafana Agent temp directory
become: false
ansible.builtin.file:
path: "{{ grafana_agent_local_tmp_dir }}"
state: directory
mode: 0751
delegate_to: localhost
check_mode: false
run_once: true
- name: Download Grafana Agent archive to local folder
become: false
ansible.builtin.get_url:
url: "{{ _grafana_agent_download_url }}"
dest: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
mode: 0664
register: _download_archive
until: _download_archive is succeeded
retries: 5
delay: 2
delegate_to: localhost
check_mode: false
run_once: true
- name: Extract grafana-agent.zip
become: false
ansible.builtin.unarchive:
src: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
dest: "{{ grafana_agent_local_tmp_dir }}"
remote_src: false
delegate_to: localhost
run_once: true
@ishanjainn can't figure it out, why same docker image with roles runs on my pc with both binaries, on gitlab pipeline only one (which is correct acorting to role run_once).
But only difference is that my pc is M2 macbook (emulated amd64 docker image) while gitlab runner runs on amd64 linux ubuntu vm.
The issue is indeed that the task has "run_once" It downloads the zip according the the facts of the first host, if that host contains a different cpu architecture than the others then that's going to cause the issue described.
Until this gets fixed the simplest workaround would be to separate the hosts based on cpu architecture in the playbook that executes the role.
Something like this:
[amd64_hosts]
example.host.tld
[arm64_hosts]
arm.host.tld
---
- name: Grafana agent on amd64 hosts
hosts: amd64_hosts
roles:
- role: grafana.grafana.grafana_agent
- name: Grafana agent on amd64 hosts
hosts: arm64_hosts
roles:
- role: grafana.grafana.grafana_agent
Based on the message in the Grafana Agent documentation:
Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.
I believe this can be closed, and migration to Alloy is required. @ishanjainn, what are your thoughts?
I have two hosts in inventory. One machine is amd64 and another is arm64.
While running ansible-playbook on my pc, it works fine.
While same playbook on gitlab ci/cd pipeline does not repeat download archive and downloads only amd64 binary
Looking at role task
it has option "run_once: true". Now I'm confused why did repeat download on local env, while pipeline did honor run_once parameter.
Anyway, I think run_once should not be here or it should be solved in some different way. On other hand, this run_once is handy when I run script over high amount of VMs.