socallinuxexpo / scale-network

SCaLE's on-site expo network configurations, wifi, tooling, and scripts
https://www.socallinuxexpo.org/
BSD 3-Clause "New" or "Revised" License
40 stars 16 forks source link

[READY] - monitoring: grafana and prometheus service enabled #642

Closed sarcasticadmin closed 5 months ago

sarcasticadmin commented 7 months ago

Description of PR

Relates to: #641 #567

This enables the foundations for grafana and prometheus services on the monitoring vm. I originally wanted to leverage collectd instead of prometheus but that ended up being more trouble than it was worth with influxdbv2 or graphite. Landing on prometheus should give us a good base and we can restrict the access to the webserver for scaping in a few ways. Grafanas admin password is currently being set on first login when creating a VM.

Currently we still need to:

This will be done in follow up PRs if this initial approach is approved.

Prometheus is also added as a common nixos module so that it can be consumed from other machines.

ss-202311251700943674

Previous Behavior

New Behavior

Tests

From another nixos machine:

nix build .#nixosConfigurations.monitor.config.system.build.vm -L
export QEMU_NET_OPTS="hostfwd=tcp::2222-:22"
./result/bin/run-nixos-vm

Setting up port forwarding to expose on localhost:8000:

ssh -o StrictHostKeyChecking=no -p 2222 rherna@127.0.0.1 -L 8000:127.0.0.1:80
nixinator commented 7 months ago

Nice work on this @sarcasticadmin , i'm currently testing your changes...

ON TLS

Were going to require certs and i presume ACME is out because it's on internal ipv4 subnet.

Hmmm....

We could get everything running on ipV6, then we could get ACME , if acme supports ipv6 however. This would require our internal servers to have public ipV6 DNS entries.. I don't know how i feel about that... maybe i should feel good.

Failing that were going to need our own CA, and start shunting certs about, which might be a massive PITA. However it would be a good opportunity to build a nix internal ACME system along the lines of https://smallstep.com/blog/private-acme-server/

however, this may not be an insignificant amount of work.

davidelang commented 7 months ago

perfect is the enemy of good enough :-)

David Lang

On Sun, 26 Nov 2023, Lee Hughes wrote:

Nice work on this Rob, i'm currently testing your changes...

ON TLS

Were going to require certs and i presume ACME is out because it's on internal ipv4 subnet.

Hmmm....

We could get everything running on ipV6, then we could get ACME , if acme supports ipv6 however. This would require our internal servers to have public ipV6 DNS entries.. I don't know how i feel about that... maybe i should feel good.

Failing that were going to need our own CA, and start shunting certs about, which might be a massive PITA. However it would be a good opportunity to build a nix internal ACME system along the lines of https://smallstep.com/blog/private-acme-server/

however, this may not be an insignificant amount of work.

owendelong commented 7 months ago

If people don’t mind importing my Root certificate (available at http://www.delong.com https://www.delong.com/), then I have a CA already set up that we can use to issue all the certs we need.

Owen

On Nov 26, 2023, at 19:08, Lee Hughes @.***> wrote:

Nice work on this Rob, i'm currently testing your changes...

ON TLS

Were going to require certs and i presume ACME is out because it's on internal ipv4 subnet.

Hmmm....

We could get everything running on ipV6, then we could get ACME , if acme supports ipv6 however. This would require our internal servers to have public ipV6 DNS entries.. I don't know how i feel about that... maybe i should feel good.

Failing that were going to need our own CA, and start shunting certs about, which might be a massive PITA. However it would be a good opportunity to build a nix internal ACME system along the lines of https://smallstep.com/blog/private-acme-server/

however, this may not be an insignificant amount of work.

— Reply to this email directly, view it on GitHub https://github.com/socallinuxexpo/scale-network/pull/642#issuecomment-1827057318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK6GTVXMPQTUVS42CKRDILYGP73PAVCNFSM6AAAAAA72KGT6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXGA2TOMZRHA. You are receiving this because your review was requested.

nixinator commented 7 months ago

I have to remember that @owendelong doesn't use the internet, he is the internet. ;-)

owendelong commented 7 months ago

Ah, but what will really bake your mind is that there is no such thing as “The Internet”.

What we call “The Internet” is a very loosely coupled set of networks that happen to have agreed to use the same protocol(s) and exchange traffic with each other.

Owen

sarcasticadmin commented 7 months ago

Nice work on this @sarcasticadmin , i'm currently testing your changes...

@nixinator thanks for testing.

however, this may not be an insignificant amount of work.

At this point my plan is to do one of the following:

  1. Issue a self signed certificate at runtime. This will require anyone connecting to web ui to "trust on first use" but thats good enough in my book for the small number of users for this service.
  2. Make grafana web service listen only on loopback and require that users to port forward via SSH to get to the web ui. Plus side here is that this doesnt require any self signed certs and we can leave it on http.

Sounds like in general were good with this approach though, Ill mark it as READY

owendelong commented 7 months ago

On Nov 27, 2023, at 15:58, Robert James Hernandez @.***> wrote:

Nice work on this @sarcasticadmin https://github.com/sarcasticadmin , i'm currently testing your changes...

@nixinator https://github.com/nixinator thanks for testing.

however, this may not be an insignificant amount of work.

At this point my plan is to do one of the following:

Issue a self signed certificate at runtime. This will require anyone connecting to web ui to "trust on first use" but thats good enough in my book for the small number of users for this service. Make grafana web service listen only on loopback and require that users to port forward via SSH to get to the web ui. Plus side here is that this doesnt require any self signed certs and we can leave it on http. Sounds like in general were good with this approach though, Ill mark it as READY

Why not use my CA? Is there a problem with, perhaps using the same wildcard cert on every box?

Owen

sarcasticadmin commented 7 months ago

Why not use my CA? Is there a problem with, perhaps using the same wildcard cert on every box?

Its mainly to avoid the bootstrap problem of getting the private key on the VM in the first place. Many ways to do this but trying to keep it to a low level of effort.

owendelong commented 7 months ago

On Nov 27, 2023, at 23:18, Robert James Hernandez @.***> wrote:

Why not use my CA? Is there a problem with, perhaps using the same wildcard cert on every box?

Its mainly to avoid the bootstrap problem of getting the private key on the VM in the first place. Many ways to do this but trying to keep it to a low level of effort.

TBH, I’m fine with just storing the private key in the server image. Since the cert isn’t being used for authentication and impersonation of our server is a pretty low risk scenario in this case, I don’t see guarding that private key as paramount.

nixinator commented 7 months ago

we may be able to deploy some keys with secrix , so that keys can be bootstrapped from the flake.nix.

However, i'll have to look at it.

owendelong commented 7 months ago

I suspect we will eventually build a full time password vault of some form (IIRC, Rob had some thoughts in this area) and we can probably just use that.

Owen

On Dec 2, 2023, at 16:01, Lee Hughes @.***> wrote:

we may be able to deploy some keys with secrix , so that keys can be bootstrapped from the flake.nix.

However, i'll have to look at it.

— Reply to this email directly, view it on GitHub https://github.com/socallinuxexpo/scale-network/pull/642#issuecomment-1837284676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK6GTU5T3K36ZKAQWEYKJTYHO6L5AVCNFSM6AAAAAA72KGT6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGI4DINRXGY. You are receiving this because you were mentioned.

nixinator commented 7 months ago

sure my current flavour of secret management is secrix.

It's rather nice.. .

https://journal.platonic.systems/introducing-secrix/