yaq-project / yaq-fyi

Main website for yaq documentation.
https://yaq.fyi
Creative Commons Zero v1.0 Universal
4 stars 6 forks source link

watchdog daemon? #36

Closed untzag closed 2 years ago

untzag commented 2 years ago

I'm opening this issue to ask for community discussion.

The Krishna group has asked for the addition of a new safety daemon. They are aware that mortal safety issues CANNOT be place PURELY in the hands of software. The point of the daemon is to hopefully prevent their mechanical safety features from kicking in because it's a big hassle to recover from that.

The desired behavior is to have a system of "checks" (other daemon state) and "consequences" (other daemon action). If any check reads as false all consequences will be immediately activated.

Checks

Consequences

I'm tempted to make a general purpose daemon to achieve this behavior, name yaqd-watchdog. It would have a complex config format with some sort of way of specifying different kinds of checks and consequences.

My question to the community: would other people use this? What similar functionality have you been wishing for?

[1] https://www.brooksinstrument.com/en/products/mass-flow-controllers [2] https://www.vici.com/vval/vval_2pos.php

untzag commented 2 years ago

As always with yaq, dynamic and static issues appear here. I imagine that users will sometimes want to override watchdog checks. For example, it might become really annoying if you couldn't energize your furnace just because one of the MFCs wasn't connected up.

A general purpose implementation might need some kind of override.

ksunden commented 2 years ago

What messages would this daemon expose? It seems like more of a specialized client than a daemon.

untzag commented 2 years ago

I think this daemon will go busy true when any check is active.

untzag commented 2 years ago

What messages would this daemon expose? It seems like more of a specialized client than a daemon.

Excellent question.

To be honest, I think my main thought for "why daemon" is because I want it running all the time and we already have a working config structure.

If we forced users to give each check a name we could expose check state as sensor channels.

ksunden commented 2 years ago

Also, I think that delving even close to safety-critical tasks like this we need to be extra careful in our messaging. Sounds like you've had that discussion with this particular group.

The fact of the matter is that if we go super general in implementation, that likely enables the exact wrong thing to be configured, so great care and documentation are a must, right away.

I'd be inclined to start with something fairly specific to proof out the idea, personally.

ksunden commented 2 years ago

That said, I think this could be paired with various forms of push alerts (from simple email to something like apprise and could be useful even if the only action taken is to inform humans that an error has occurred.

I've long had such ideas for things like the Wright Group's lasers going CW or otherwise having an error state.

untzag commented 2 years ago

I think it's possible to have the correct messaging here regarding safety.

Having the ability to temporarily disable a check or consequence over the yaq interface would make some sense to me. I could see Dan sending a quick message to disable push alerts for 10 hours while aligning the laser.

I think an "all or nothing" disable would be best. If you need more fine control make multiple daemons.