grafana / grafana-ansible-collection

grafana.grafana Ansible collection provides modules and roles for managing various resources on Grafana Cloud and roles to manage and deploy Grafana Agent and Grafana
https://docs.ansible.com/ansible/latest/collections/grafana/grafana/index.html#plugins-in-grafana-grafana
GNU General Public License v3.0
136 stars 88 forks source link

BUG: Download grafana agent archive to local folder in case of different arch #166

Open davordbetter opened 7 months ago

davordbetter commented 7 months ago

I have two hosts in inventory. One machine is amd64 and another is arm64.

While running ansible-playbook on my pc, it works fine.

TASK [grafana.grafana.grafana_agent : Create Grafana Agent temp directory] ****************************************************************************************************************************************
ok: [mon-vm -> localhost]

TASK [grafana.grafana.grafana_agent : Download Grafana Agent archive to local folder] *****************************************************************************************************************************
changed: [mon-vm -> localhost]
changed: [dev-be1 -> localhost]

TASK [grafana.grafana.grafana_agent : Extract grafana-agent.zip] **************************************************************************************************************************************************
.fcst....?? grafana-agent-linux-arm64
changed: [mon-vm -> localhost]
.fcst....?? grafana-agent-linux-amd64
changed: [dev-be1 -> localhost]

TASK [grafana.grafana.grafana_agent : Set local path] *************************************************************************************************************************************************************
ok: [mon-vm]
ok: [dev-be1]

TASK [grafana.grafana.grafana_agent : Propagate downloaded binary] ************************************************************************************************************************************************
ok: [mon-vm]
diff skipped: destination file appears to be binary
diff skipped: source file size is greater than 104448
changed: [dev-be1]

While same playbook on gitlab ci/cd pipeline does not repeat download archive and downloads only amd64 binary

TASK [grafana.grafana.grafana_agent : Create Grafana Agent temp directory] *****
--- before
+++ after
@@ -1,5 +1,5 @@
 {
-    "mode": "0755",
+    "mode": "0751",
     "path": "/tmp/grafana-agent",
-    "state": "absent"
+    "state": "directory"
 }
changed: [ssxmon-vm -> localhost]
TASK [grafana.grafana.grafana_agent : Download Grafana Agent archive to local folder] ***
changed: [ssxmon-vm -> localhost]
TASK [grafana.grafana.grafana_agent : Extract grafana-agent.zip] ***************
>f++++++.?? grafana-agent-linux-arm64
changed: [ssxmon-vm -> localhost]
TASK [grafana.grafana.grafana_agent : Set local path] **************************
ok: [ssxmon-vm]
ok: [ssxdev-be1]
TASK [grafana.grafana.grafana_agent : Propagate downloaded binary] *************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
fatal: [ssxdev-be1]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/grafana-agent/grafana-agent-linux-amd64' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
ok: [ssxmon-vm]

Looking at role task

    - name: Download Grafana Agent archive to local folder
      become: false
      ansible.builtin.get_url:
        url: "{{ _grafana_agent_download_url }}"
        dest: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
        mode: 0664
      register: _download_archive
      until: _download_archive is succeeded
      retries: 5
      delay: 2
      delegate_to: localhost
      check_mode: false
      run_once: true

it has option "run_once: true". Now I'm confused why did repeat download on local env, while pipeline did honor run_once parameter.

Anyway, I think run_once should not be here or it should be solved in some different way. On other hand, this run_once is handy when I run script over high amount of VMs.

devmittal02 commented 7 months ago

did you find any workaround for the same, getting same issue while running it on bunch of hosts having both arm64 and amd64 type archs

ishanjainn commented 7 months ago

Hey @devmittal02, Haven't checked it out as we are building a new role for Grafana Agent which is for flow mode (recommended way now) so probably can test this out on that.

If you wanna double check, we have a PR open so I can get any changes you want in that right now.

davordbetter commented 7 months ago

My "workaround" is to group arm and amd VM in different groups and run 2 pipelines with interntory limit (-l)

ishanjainn commented 7 months ago

This seems a very weird issue, @davordbetter any thoughts on why this is specially failing on GitLab?

@devmittal02 What platform are you running the playbook on?

devmittal02 commented 7 months ago

Hey i think the issue is because of this run once, i am running on AWX to the entire fleet of ec2 machines, it spins up a on demand container and triggeres the playbook across the machines using SSM,

What's happening is lets say for 1st machine when it ran lets say that was AMD, so it downloaded the binary for that only and store in local, next time when ARM machine comes , it skips download step because of "run once" and copies only the previous AMD variant of binary, hence the issue of file doesn't exists, as it is a wrong binary

- name: Download Grafana Agent binary to controller (localhost)
  block:
    - name: Create Grafana Agent temp directory
      become: false
      ansible.builtin.file:
        path: "{{ grafana_agent_local_tmp_dir }}"
        state: directory
        mode: 0751
      delegate_to: localhost
      check_mode: false
      run_once: true

    - name: Download Grafana Agent archive to local folder
      become: false
      ansible.builtin.get_url:
        url: "{{ _grafana_agent_download_url }}"
        dest: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
        mode: 0664
      register: _download_archive
      until: _download_archive is succeeded
      retries: 5
      delay: 2
      delegate_to: localhost
      check_mode: false
      run_once: true

    - name: Extract grafana-agent.zip
      become: false
      ansible.builtin.unarchive:
        src: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
        dest: "{{ grafana_agent_local_tmp_dir }}"
        remote_src: false
      delegate_to: localhost
      run_once: true
davordbetter commented 7 months ago

@ishanjainn can't figure it out, why same docker image with roles runs on my pc with both binaries, on gitlab pipeline only one (which is correct acorting to role run_once).

But only difference is that my pc is M2 macbook (emulated amd64 docker image) while gitlab runner runs on amd64 linux ubuntu vm.

gardar commented 7 months ago

The issue is indeed that the task has "run_once" It downloads the zip according the the facts of the first host, if that host contains a different cpu architecture than the others then that's going to cause the issue described.

Until this gets fixed the simplest workaround would be to separate the hosts based on cpu architecture in the playbook that executes the role.

Something like this:

inventory/hosts

[amd64_hosts]
example.host.tld

[arm64_hosts]
arm.host.tld

playbook.grafana_agent.yml


---
- name: Grafana agent on amd64 hosts
  hosts: amd64_hosts
  roles:
    - role: grafana.grafana.grafana_agent

- name: Grafana agent on amd64 hosts
  hosts: arm64_hosts
  roles:
    - role: grafana.grafana.grafana_agent
voidquark commented 2 weeks ago

Based on the message in the Grafana Agent documentation:

Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.

I believe this can be closed, and migration to Alloy is required. @ishanjainn, what are your thoughts?