grafana / grafana-ansible-collection

grafana.grafana Ansible collection provides modules and roles for managing various resources on Grafana Cloud and roles to manage and deploy Grafana Agent and Grafana
https://docs.ansible.com/ansible/latest/collections/grafana/grafana/index.html#plugins-in-grafana-grafana
GNU General Public License v3.0
98 stars 63 forks source link

grafana.grafana.alloy role installs alloy binary under /etc/alloy/, which does not work on RHEL-based systems due to SELinux #194

Open hakong opened 2 months ago

hakong commented 2 months ago

Binaries should be in binary folders on RHEL-based systems, so SELinux allows them to do binary-like things, like connecting to the internet 😄

The alloy binary will get an selinux label like unconfined_u:object_r:etc_t:s0 when placed under /etc/alloy, which is not allowed to open tcp sockets:

type=AVC msg=audit(1714732141.534:5214): avc:  denied  { name_connect } for  pid=60115 comm="alloy-linux-amd" dest=443 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:http_port_t:s0 tclass=tcp_socket permissive=0
type=AVC msg=audit(1714732146.536:5215): avc:  denied  { name_connect } for  pid=60115 comm="alloy-linux-amd" dest=443 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:http_port_t:s0 tclass=tcp_socket permissive=0
type=AVC msg=audit(1714732146.537:5216): avc:  denied  { name_connect } for  pid=60115 comm="alloy-linux-amd" dest=443 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:http_port_t:s0 tclass=tcp_socket permissive=0
type=AVC msg=audit(1714732151.538:5217): avc:  denied  { name_connect } for  pid=60115 comm="alloy-linux-amd" dest=443 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:http_port_t:s0 tclass=tcp_socket permissive=0

The RPM package installs alloy under /usr/bin/, which is correct.

The grafana.grafana.alloy role should use the package manager to install alloy. That would solve this issue:

[root@container-1 ~]# rpm -qlp alloy-1.0.0-1.amd64.rpm
/etc/alloy/config.alloy
/etc/sysconfig/alloy
/usr/bin/alloy
/usr/lib/systemd/system/alloy.service
gardar commented 2 months ago

Can you PR it up @hakong ?

panfantastic commented 1 month ago

I can help with this @hakong @gardar as I'm looking to deploy alloy into production RHEL environments in the coming weeks using this ansible role and this will be an issue for me also.

Alternatively, or in addition to, an SELinux policy could be created that allows the existing binary deployment method as an interim.

hakong commented 1 month ago

@panfantastic If you want to quick-and-dirty this as an interim solution, rather than messing with selinux you can just use something like this. It's what I used and it worked fine. RPM package installs alloy in some bin folder and it works fine with SELinux OOTB.

Note: I had an issue (at least on debian) that alloy would not start due to the /var/lib/alloy directory not existing and /etc/default/alloy not existing. The DEB package did not create them as far as I can remember. The RPM package worked fine.

- name: Configure Grafana YUM repository
  ansible.builtin.copy:
    dest: /etc/yum.repos.d/grafana.repo
    owner: root
    group: root
    mode: '0644'
    content: |
      [grafana]
      name=grafana
      baseurl=https://rpm.grafana.com
      repo_gpgcheck=1
      enabled=1
      gpgcheck=1
      gpgkey=https://rpm.grafana.com/gpg.key
      sslverify=1
      sslcacert=/etc/pki/tls/certs/ca-bundle.crt

- name: install alloy
  package:
    name: alloy
    state: present

- name: Configure Alloy
  vars:
    prometheus_push_endpoint: "https://prometheus-prod-24-prod-eu-west-2.grafana.net/api/prom/push" # Update with your Prometheus endpoint
    loki_endpoint: "https://logs-prod-012.grafana.net/loki/api/v1/push" # Update with your Loki endpoint
    prometheus_username: "x"  # Update with your Prometheus username
    prometheus_password: "x"  # Update with your Prometheus password
    loki_username: "x"  # Update with your Loki username, same as Grafana Cloud username if you are using Grafana Cloud
    loki_password: "x"  # Update with your Loki password, same as Grafana Cloud username if you are using Grafana Cloud
  ansible.builtin.copy:
    dest: /etc/alloy/config.alloy
    content: |
      prometheus.exporter.self "integrations_alloy" { }

      discovery.relabel "integrations_alloy" {
        targets = prometheus.exporter.self.integrations_alloy.targets

        rule {
          target_label = "instance"
          replacement  = constants.hostname
        }

        rule {
          target_label = "alloy_hostname"
          replacement  = constants.hostname
        }

        rule {
          target_label = "job"
          replacement  = "integrations/alloy-check"
        }
      }

      prometheus.scrape "integrations_alloy" {
        targets    = discovery.relabel.integrations_alloy.output
        forward_to = [prometheus.relabel.integrations_alloy.receiver]

        scrape_interval = "60s"
      }

      prometheus.relabel "integrations_alloy" {
        forward_to = [prometheus.remote_write.metrics_service.receiver]

        rule {
          source_labels = ["__name__"]
          regex         = "(prometheus_target_sync_length_seconds_sum|prometheus_target_scrapes_.*|prometheus_target_interval.*|prometheus_sd_discovered_targets|alloy_build.*|prometheus_remote_write_wal_samples_appended_total|process_start_time_seconds)"
          action        = "keep"
        }
      }

      prometheus.remote_write "metrics_service" {
        endpoint {
          url = "https://prometheus-prod-24-prod-eu-west-2.grafana.net/api/prom/push"

          basic_auth {
            username = "{{ prometheus_username }}"
            password = "{{ prometheus_password }}"
          }
        }
      }

      loki.write "grafana_cloud_loki" {
        endpoint {
          url = "https://logs-prod-012.grafana.net/loki/api/v1/push"

          basic_auth {
            username = "{{ loki_username }}"
            password = "{{ loki_password }}"
          }
        }
      }
      discovery.relabel "integrations_node_exporter" {
        targets = prometheus.exporter.unix.integrations_node_exporter.targets

        rule {
          target_label = "instance"
          replacement  = constants.hostname
        }

        rule {
          target_label = "job"
          replacement = "integrations/node_exporter"
        }
      }

      prometheus.exporter.unix "integrations_node_exporter" {
        disable_collectors = ["ipvs", "btrfs", "infiniband", "xfs", "zfs"]

        filesystem {
          fs_types_exclude     = "^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|tmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$"
          mount_points_exclude = "^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+)($|/)"
          mount_timeout        = "5s"
        }

        netclass {
          ignored_devices = "^(veth.*|cali.*|[a-f0-9]{15})$"
        }

        netdev {
          device_exclude = "^(veth.*|cali.*|[a-f0-9]{15})$"
        }
      }

      prometheus.scrape "integrations_node_exporter" {
        targets    = discovery.relabel.integrations_node_exporter.output
        forward_to = [prometheus.relabel.integrations_node_exporter.receiver]
      }

      prometheus.relabel "integrations_node_exporter" {
        forward_to = [prometheus.remote_write.metrics_service.receiver]

        rule {
          source_labels = ["__name__"]
          regex         = "up|node_arp_entries|node_boot_time_seconds|node_context_switches_total|node_cpu_seconds_total|node_disk_io_time_seconds_total|node_disk_io_time_weighted_seconds_total|node_disk_read_bytes_total|node_disk_read_time_seconds_total|node_disk_reads_completed_total|node_disk_write_time_seconds_total|node_disk_writes_completed_total|node_disk_written_bytes_total|node_filefd_allocated|node_filefd_maximum|node_filesystem_avail_bytes|node_filesystem_device_error|node_filesystem_files|node_filesystem_files_free|node_filesystem_readonly|node_filesystem_size_bytes|node_intr_total|node_load1|node_load15|node_load5|node_md_disks|node_md_disks_required|node_memory_Active_anon_bytes|node_memory_Active_bytes|node_memory_Active_file_bytes|node_memory_AnonHugePages_bytes|node_memory_AnonPages_bytes|node_memory_Bounce_bytes|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_CommitLimit_bytes|node_memory_Committed_AS_bytes|node_memory_DirectMap1G_bytes|node_memory_DirectMap2M_bytes|node_memory_DirectMap4k_bytes|node_memory_Dirty_bytes|node_memory_HugePages_Free|node_memory_HugePages_Rsvd|node_memory_HugePages_Surp|node_memory_HugePages_Total|node_memory_Hugepagesize_bytes|node_memory_Inactive_anon_bytes|node_memory_Inactive_bytes|node_memory_Inactive_file_bytes|node_memory_Mapped_bytes|node_memory_MemAvailable_bytes|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_memory_SReclaimable_bytes|node_memory_SUnreclaim_bytes|node_memory_ShmemHugePages_bytes|node_memory_ShmemPmdMapped_bytes|node_memory_Shmem_bytes|node_memory_Slab_bytes|node_memory_SwapTotal_bytes|node_memory_VmallocChunk_bytes|node_memory_VmallocTotal_bytes|node_memory_VmallocUsed_bytes|node_memory_WritebackTmp_bytes|node_memory_Writeback_bytes|node_netstat_Icmp6_InErrors|node_netstat_Icmp6_InMsgs|node_netstat_Icmp6_OutMsgs|node_netstat_Icmp_InErrors|node_netstat_Icmp_InMsgs|node_netstat_Icmp_OutMsgs|node_netstat_IpExt_InOctets|node_netstat_IpExt_OutOctets|node_netstat_TcpExt_ListenDrops|node_netstat_TcpExt_ListenOverflows|node_netstat_TcpExt_TCPSynRetrans|node_netstat_Tcp_InErrs|node_netstat_Tcp_InSegs|node_netstat_Tcp_OutRsts|node_netstat_Tcp_OutSegs|node_netstat_Tcp_RetransSegs|node_netstat_Udp6_InDatagrams|node_netstat_Udp6_InErrors|node_netstat_Udp6_NoPorts|node_netstat_Udp6_OutDatagrams|node_netstat_Udp6_RcvbufErrors|node_netstat_Udp6_SndbufErrors|node_netstat_UdpLite_InErrors|node_netstat_Udp_InDatagrams|node_netstat_Udp_InErrors|node_netstat_Udp_NoPorts|node_netstat_Udp_OutDatagrams|node_netstat_Udp_RcvbufErrors|node_netstat_Udp_SndbufErrors|node_network_carrier|node_network_info|node_network_mtu_bytes|node_network_receive_bytes_total|node_network_receive_compressed_total|node_network_receive_drop_total|node_network_receive_errs_total|node_network_receive_fifo_total|node_network_receive_multicast_total|node_network_receive_packets_total|node_network_speed_bytes|node_network_transmit_bytes_total|node_network_transmit_compressed_total|node_network_transmit_drop_total|node_network_transmit_errs_total|node_network_transmit_fifo_total|node_network_transmit_multicast_total|node_network_transmit_packets_total|node_network_transmit_queue_length|node_network_up|node_nf_conntrack_entries|node_nf_conntrack_entries_limit|node_os_info|node_sockstat_FRAG6_inuse|node_sockstat_FRAG_inuse|node_sockstat_RAW6_inuse|node_sockstat_RAW_inuse|node_sockstat_TCP6_inuse|node_sockstat_TCP_alloc|node_sockstat_TCP_inuse|node_sockstat_TCP_mem|node_sockstat_TCP_mem_bytes|node_sockstat_TCP_orphan|node_sockstat_TCP_tw|node_sockstat_UDP6_inuse|node_sockstat_UDPLITE6_inuse|node_sockstat_UDPLITE_inuse|node_sockstat_UDP_inuse|node_sockstat_UDP_mem|node_sockstat_UDP_mem_bytes|node_sockstat_sockets_used|node_softnet_dropped_total|node_softnet_processed_total|node_softnet_times_squeezed_total|node_systemd_unit_state|node_textfile_scrape_error|node_time_zone_offset_seconds|node_timex_estimated_error_seconds|node_timex_maxerror_seconds|node_timex_offset_seconds|node_timex_sync_status|node_uname_info|node_vmstat_oom_kill|node_vmstat_pgfault|node_vmstat_pgmajfault|node_vmstat_pgpgin|node_vmstat_pgpgout|node_vmstat_pswpin|node_vmstat_pswpout|process_max_fds|process_open_fds"
          action        = "keep"
        }
      }
panfantastic commented 1 month ago

Yes, we need to split the installs between redhat and debian. Thanks for your config, I'll try and get a PR together this weekend for this stuff unless you have something ready to go.

panfantastic commented 1 month ago

@hakong Am I correct to say you were in a getenforce is 1 state?

I've been trying to get the build environment tests used to do selinux like you get in rhel by default and have failed so far.

RHEL is selinux by default, but the containers for rocky etc I'm trying to test with aren't :( I'm not sure how to submit a patch with molecule testing at this point!

voidquark commented 1 month ago

If you're seeking inspiration, perhaps aligning your PR with the Loki/Promtail approach. They already support Debian/RHEL systems with SELinux out of the box.

panfantastic commented 1 month ago

@voidquark can you link me please?

voidquark commented 1 month ago

@voidquark can you link me please?

Promtail role and Loki role

panfantastic commented 1 month ago

I took the opportunity to install it on a working selinux (enforcing) system and it installs fine so I think all that is needed is to split the install process between redhat clones and debian clones (sorry any suse clones, Gentoo knows how to do it on their own ;) ).

Aethylred commented 1 month ago

Keeping an eye on this as the documentation for Grafana Agent says it's being deprecated for Alloy, and we run RHEL