rsmp-nordic / rsmp_core

RSMP core specification
MIT License
6 stars 1 forks source link

core 3.1.5, transmission of all alarms during connection (lots of messages) #142

Open sveitech opened 10 months ago

sveitech commented 10 months ago

In 3.1.5, the server handshake sequence changed to send all alarms, regardless of status (https://rsmp-nordic.github.io/rsmp_specifications/core/3.1.5/applicability/transport_of_data.html#communication-establishment-between-sites-and-supervision-system).

Given that several alarms are arrays (component-id is detector or group), this results in about 1500 alarm messages to be transmitted during the initial handshake. Is this intentional?

The SXL documents for 1.0.13, 1.0.15, e.t.c. state 255 component IDs for detectors and for groups. There are 6 alarms which are arrays (A0008, A0101, A0201, A0202, A0301, A0302) so 6 * 255 = 1530

otterdahl commented 10 months ago

On Fri, Dec 08, 2023 at 03:56:48AM -0800, sveitech wrote:

In 3.1.5, the server handshake sequence changed to send all alarms, regardless of status (https://rsmp-nordic.github.io/rsmp_specifications/core/3.1.5/ applicability/transport_of_data.html# communication-establishment-between-sites-and-supervision-system).

Given that several alarms are arrays (component-id is detector or group), this results in about 1500 alarm messages to be transmitted during the initial handshake. Is this intentional?

The SXL documents for 1.0.13, 1.0.15, e.t.c. state 255 component IDs for detectors and for groups. There are 6 alarms which are arrays (A0008, A0101, A0201, A0202, A0301, A0302) so 6 * 255 = 1530

Yes, this change was made with 3.1.4 back in 2017. This change was made due to issues we saw with some supervision systems showing incorrect alarm status.

In previous versions; when reconnecting after a long connection interruption only the active or blocked alarms were sent. Any alarms that wasn't sent during handshake should be interpreted as inactive, unblocked and unacknowledged by the supervision system, at least that was the idea.

This didn't work very well in reality, mostly because the supervision system had trouble determining when all alarms (as part of the handshake) has been received.

In reality, if an active alarm became inactive during the connection interruption, that alarm would not be sent, causing alarms to incorrectly stay active in the supervision system.

At one point we thought that since we required alarms to be buffered, it could be used to receive any inactive alarms that occurred during connection interruption, but that turned out to be unreliable. The controller might have been restarted, replaced or otherwise missing buffered data.

In conclusion, the simplest way to solve the problems without changing the specification too much was to include inactive alarms in the handshake.

But please also note that 255 is the theoretical limit. I've never seen a controller that has more that many signal groups or detector logics. But I agree that it causes many alarms messages to be sent.

emiltin commented 10 months ago

We should consider an effective way for the supervisor to request to the complete state of a site at initial connection.