socallinuxexpo / scale-network

SCaLE's on-site expo network configurations, wifi, tooling, and scripts
https://www.socallinuxexpo.org/
BSD 3-Clause "New" or "Revised" License
40 stars 16 forks source link

Openwrt seems to have a prometheus exporter. #641

Closed nixinator closed 5 months ago

nixinator commented 7 months ago

Seems that openwrt has a prometheus exporter now.

Shall we add it , and then setup graphana and Prometheus to monitor and display.

could be cool.

don't know if the juniper gear has a promethus exporter.. be nice to enable if we do ?

https://grafana.com/blog/2021/02/09/how-i-monitor-my-openwrt-router-with-grafana-cloud-and-prometheus/

owendelong commented 7 months ago

If someone is so inclined, they could experiment with this: 

I don’t know anything about it (or prometheus for that matter).

Owen

On Nov 20, 2023, at 16:09, Lee Hughes @.***> wrote:

Seems that openwrt has a prometheus exporter now.

Shall we add it , and then setup graphana and Prometheus to monitor and display.

could be cool.

don't know if the juniper gear has a promethus exporter.. be nice to enable if we do ?

https://grafana.com/blog/2021/02/09/how-i-monitor-my-openwrt-router-with-grafana-cloud-and-prometheus/

— Reply to this email directly, view it on GitHub https://github.com/socallinuxexpo/scale-network/issues/641, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK6GTQR7YNS6LYAFKYL643YFPWL5AVCNFSM6AAAAAA7TWNKJKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYDGMRSGI4DOMQ. You are receiving this because you are subscribed to this thread.

nixinator commented 7 months ago

we just need to add this code to a semi powerful machine, as things get a quite busy on the monitoring side, and can create considerable cpu/disk load, however network is fairly ok.


{ config, pkgs, ... }: {
  # grafana configuration
  services.grafana = {
    enable = true;
    domain = "grafana.pele";
    port = 2342;
    addr = "127.0.0.1";
  };

  # nginx reverse proxy
  services.nginx.virtualHosts.${config.services.grafana.domain} = {
    locations."/" = {
        proxyPass = "http://127.0.0.1:${toString config.services.grafana.port}";
        proxyWebsockets = true;
    };
  };
}

on the openwrt side, i presume the Prometheus exporter is just an APK.

# hosts/chrysalis/configuration.nix
  services.prometheus = {
    enable = true;
    port = 9001;
  };

i don't have any openwrt boxes available, but if we can deploy the agent to a test box , we can take a look at the openwrt promethious agent, and see if it thrashes it, or it can run without any performance problems.

@rob , do you wanna have a crack at this on tmate sometime, if you have a hardware openwrt available to you?

@kylerisse , i think there might be a hookable api for events.. we could use that to to some dynamic configuration to the AP's depending on event. Thats a bit star trek, but feasable.

We could limit the number of client being associated per wifi channel / AP for instance, which would be quite a cool little load balancing script.

https://grafana.com/blog/2021/02/09/how-i-monitor-my-openwrt-router-with-grafana-cloud-and-prometheus/

davidelang commented 7 months ago

how does the data that the prometheus exporter provide compare with what we have configured zabbix to gather in the past from the openwrt systems?

I'd have to dig into the configs for specifics, but besides the standard linux stats it also include a lot of wifi related stats.

and then we also gather wifi details via syslog (dumps of what endpoints are connected that I remember, but I think there are a couple of other things as well)

David Lang

On Wed, 22 Nov 2023, Lee Hughes wrote:

we just need to add this code to a semi powerful machine, as things get a quite busy on the monitoring side, and can create considerable cpu/disk load, however network is fairly ok.


{ config, pkgs, ... }: {
 # grafana configuration
 services.grafana = {
   enable = true;
   domain = "grafana.pele";
   port = 2342;
   addr = "127.0.0.1";
 };

 # nginx reverse proxy
 services.nginx.virtualHosts.${config.services.grafana.domain} = {
   locations."/" = {
       proxyPass = "http://127.0.0.1:${toString config.services.grafana.port}";
       proxyWebsockets = true;
   };
 };
}

on the openwrt side, i presume the Prometheus exporter is just an APK.

# hosts/chrysalis/configuration.nix
 services.prometheus = {
   enable = true;
   port = 9001;
 };

i don't have any openwrt boxes available, but if we can deploy the agent to a test box , we can take a look at the openwrt promethious agent, and see if it thrashes it, or it can run without any performance problems.

@rob , do you wanna have a crack at this on tmate sometime, if you have a hardware openwrt available to you?

@kylerisse , i think there might be a hookable api for events..?????

owendelong commented 7 months ago

All else being equal, I’d generally rather use Apache than NGINX unless there’s some significant reason to do otherwise. Most of my interactions with NGINX have disabused me of my prior belief that Apache was the most difficult thing one could possibly try to configure.

Owen

On Nov 22, 2023, at 10:16, Lee Hughes @.***> wrote:

we just need to add this code to a semi powerful machine, as things get a quite busy on the monitoring side, and can create considerable cpu/disk load, however network is fairly ok.

{ config, pkgs, ... }: {

grafana configuration

services.grafana = { enable = true; domain = "grafana.pele"; port = 2342; addr = "127.0.0.1"; };

nginx reverse proxy

services.nginx.virtualHosts.${config.services.grafana.domain} = { locations."/" = { proxyPass = "http://127.0.0.1:${toString config.services.grafana.port}"; proxyWebsockets = true; }; }; } on the openwrt side, i presume the Prometheus exporter is just an APK.

hosts/chrysalis/configuration.nix

services.prometheus = { enable = true; port = 9001; }; i don't have any openwrt boxes available, but if we can deploy the agent to a test box , we can take a look at the openwrt promethious agent, and see if it thrashes it, or it can run without any performance problems.

@rob https://github.com/rob , do you wanna have a crack at this on tmate sometime, if you have a hardware openwrt available to you?

@kylerisse https://github.com/kylerisse , i think there might be a hookable api for events..?????

— Reply to this email directly, view it on GitHub https://github.com/socallinuxexpo/scale-network/issues/641#issuecomment-1823256670, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK6GTRSGHNG2OA6SBEMLRLYFY6ODAVCNFSM6AAAAAA7TWNKJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRTGI2TMNRXGA. You are receiving this because you commented.

sarcasticadmin commented 7 months ago

Prometheus is definitely a potential option. My only hesitation is that it would be nice to limit to subnet(s) that would be accessing the prometheus endpoint since its a pull model. Currently the APs dont have a firewall since we never needed one.

Ive been more interested in exploring collectd which is a push model. That way we can just point everything to the monitoring machine and leverage the firewall on it for limiting whats subnets can send traffic to it.

owendelong commented 7 months ago

If it has a deterministic port number or such, we could add rules to the management subnet on the routers. Is it possible to configure the prometheus exporter to listen only on the management interface?

On Nov 22, 2023, at 12:52, Robert James Hernandez @.***> wrote:

Prometheus is definitely a potential option. My only hesitation is that it would be nice to limit to subnet(s) that would be accessing the prometheus endpoint since its a pull model. Currently the APs dont have a firewall since we never needed one.

Ive been more interested in exploring collectd which is a push model. That way we can just point everything to the monitoring machine and leverage the firewall on it for limiting whats subnets can send traffic to it.

— Reply to this email directly, view it on GitHub https://github.com/socallinuxexpo/scale-network/issues/641#issuecomment-1823488241, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK6GTVHDZ35U3EAVQAFADTYFZQYNAVCNFSM6AAAAAA7TWNKJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRTGQ4DQMRUGE. You are receiving this because you commented.

davidelang commented 7 months ago

I will say that monitoring can be more cpu and disk I/O intensive than many people think. Especially if it's a monitoring too that puts data into a database for later queries (zabbix does this, as does at least one of the other tools people have been talking about), and if the system does polling for each data point individually (SNMP for example) instead of the push model (zabbix or collectd) it takes a lot more CPU than you would think.

Especially in this environment, I lean towards overdoing monitoring, we don't get a second chance to gather data and tune the monitoring, so I want to gather everything I can, even if we don't look at it much during the show, it's available for after-the-fact analysis for us to find things we didn't see during the show.

As a result, even with only a couple hundred devices to monitor, it ends up being a lot of data points, probably putting us in a similar ballpark to more tuned monitoring systems of a few thousand devices.

David Lang

On Wed, 22 Nov 2023, Robert James Hernandez wrote:

Prometheus is definitely a potential option. My only hesitation is that it would be nice to limit to subnet(s) that would be accessing the prometheus endpoint since its a pull model. Currently the APs dont have a firewall since we never needed one.

Ive been more interested in exploring collectd which is a push model. That way we can just point everything to the monitoring machine and leverage the firewall on it for limiting whats subnets can send traffic to it.