makerplane / canfix-spec

CAN-FIX Communication Protocol Specification
5 stars 6 forks source link

Fix-Gateway needs nodestatus type quorum #4

Open e100 opened 10 months ago

e100 commented 10 months ago

I would like to add an additional Node Status Parameter ID to the specification. Since the first unused Parameter ID is 9, I thought that would be the proper ID to use.

I will be submitting a pull request to the python-canfix library and Fix-Gateway to support this.

Quorum is used in Fix Gateway to elect a leader gateway. Only the leader should perform actions such as modifying data using the compute plugin or sendind data/commands to other components in the system. Quorum election is very simple, each Fix Gateway is assigned a nodeid, each node sends it's id as the value in the nodestatus quorum message. The largest value from all quorum messages that have not become old is the nodeid that becomes the leader.

A global variable in Fix Gateway and the FIXID 'LEADER' is set within each gateway to be True or False. Those values can be used to decide if an action should be performed or not within each gateway.

birkelbach commented 9 months ago

I guess I don't understand what problem this is trying to solve.

e100 commented 9 months ago

With multiple FIX Gateways on a single CAN bus, for redundancy, we need some way to ensure only one of them performs some actions. For example the compute plugin or sending waypoints to an auto pilot. Some data such as engine data might be collected on the redundant gateways using another can bus, only one of them needs to re-broadcast that information on the canfix bus for other components to use.

Should one of the gateways fail or lose communications another one should automatically take over those functions.

This quorum message is how the leader gateway is elected. Each gateway is assigned an ID and sends this ID as part of the quorum message. The ID that is highest, and sending quorum messages is the leader.

Other plugins in the FIX Gateway can make decisions based on the global variable quorum.leader or fixid LEADER. The local pyEFIS connected to the leader could, for example, utilize that fixid in some way if desired too. By default leader is True, it is only ever set to False by the quorum plugin. So quorum related changes only affect those who choose to enable the quorum plugin.

Without this we will have duplicate messages from the multiple gateways and possibly constantly conflicting information.

This is the code that elects the leader: https://github.com/e100/FIX-Gateway/blob/combined/fixgw/plugins/quorum/__init__.py#L50

The quorum canfix message is created here: https://github.com/e100/FIX-Gateway/blob/combined/fixgw/plugins/canfix/mapping.py#L245