Open jonhiggs opened 3 months ago
I've been thinking about this...
There are two competing priorities:
But that leads to state being distributed and it's annoyingly difficult to determine what, if anything is down. You need to query each daemon for its state, or wait for a renotify
gate to open.
So I've been thinking about a way to redistribute the state of many daemons for presentation. It probably doesn't need to be a super-reliable path but if it were, hanging notifications off that would have great delivery guarantees.
The state needs to go to a database which something else will present. Some options I've been considering are:
I'm thinking OpenTelemetry is the best option. It opens up a lot of possibilities. I think I'll experiment with fz -> OTLP -> OTel Collector -> Clickhouse -> Metabase.
Also, I hope it goes without saying that using OTel is completely optional. It would make larger deployments manageable, but it certainly has an associated complexity cost.
Work out how to make debugging easier when the tests are distributed across hosts.