pez-globo / pufferfish-software

All software for the Pufferfish ventilator.
Apache License 2.0
0 stars 1 forks source link

Need alarm mute cancellations and session tracking to handle disconnections #372

Closed ethanjli closed 3 years ago

ethanjli commented 3 years ago

To have robust behavior of the alarm mute functionality, we need to appropriately handle what happens when the firmware-backend and/or backend-frontend connections are lost (either due to physical issues or due to a software crash/restart). Specifically, we would like the following behaviors:

Sequence of events:

  1. Alarms are muted for 2 minutes
  2. Firmware is disconnected or it restarts (which may or may not cause a disconnection event in the backend)
  3. Any alarm mute should be cancelled early (firmware and backend and frontend all set AlarmMute.active to false) in the firmware and frontend and backend. The cancelled alarm mute should not be spuriously reactivated under any circumstances. For example, if the firmware restarts and it receives a stale AlarmMuteRequest with the active field set to true, it should recognize that the request is stale and ignore it instead of initiating a new alarm mute for 120 seconds.

Sequence of events:

  1. Alarms are muted for 2 minutes
  2. Backend is disconnected
  3. Any alarm mute should be cancelled early (firmware and backend and frontend all set AlarmMute.active to false) in the firmware and frontend and backend.

These requested behaviors are similar to the behavior between the firmware and backend for NextLogEvents and ExpectedLogEvent. We will need to add a session_id field to the request/response protobuf messages, so that the firmware can tell when an inbound AlarmMuteRequest belongs to a stale session:

AlarmMute:

AlarmMuteRequest:

Should the remaining field be uint32 or uint64? Should it be in units of ms or s? This needs to be determined. It may save us a few bytes to report it in seconds, but all other timestamps in the message protocols are in units of milliseconds. If we use units of seconds, we should rename the field to remaining_sec to make the unit difference very explicit. But it may be better just to use the same representation for timestamps & durations as every other protobuf message.

While things are disconnected, all alarms should be unmuted and it should not be possible to mute them (either from the frontend or from the membrane button), as it is impossible to make the entire system will mute itself under this condition (since the firmware and frontend can't synchronize with each other).

ethanjli commented 3 years ago

We also need to disable the alarms mute button while the backend or firmware is disconnected.

ethanjli commented 3 years ago

We may need to add a similar session_id key on all other *Request messages - consider the following scenario:

The backend crashes after it has sent a new request to the firmware but before it has saved the request to the filesystem. Then once it restarts, it will load a stale version of the request and send that version to the firmware. The firmware should have a way of detecting such a stale request and ignoring it.

Alternative names to consider instead of session_id: transaction_id or idempotency_key. However, session_id may make the most sense for alarm muting and for the event log's list synchronization; maybe we should use session_id everywhere for consistent naming?