Closed BenediktBurger closed 1 year ago
My initial proposal:
A Coordinator Co1
joining a network follows a few steps:
Co2
of the Network.Co2
, to tell all other Coordinators about Co1
s address.Co2
tells all the Coordinators signed in (Co3
, Co4
...) about Co1
with a CO_NEW message.Co3
, Co4
...) sign in to Co1
.Two Coordinators shall follow a more thorough sign-in/sign-out procedure than Components (address is for example host and port). The sign-in might happen because of a CO_NEW message arrived or at startup. The sign-out might happen because the Coordinator shuts down.
sequenceDiagram
participant r1 as ROUTER
participant d1 as DEALER
participant r2 as ROUTER
participant d2 as DEALER
Note over r1,d1: N1 Coordinator<br>at address1
Note over r2,d2: N2 Coordinator<br>at address2
Note over r1,d2: Sign in between two Coordinators
Note right of r1: shall connect<br>to address2
activate d1
Note left of d1: created with<br> name "temp-NS"
d1-->>r2: connect to address2
d1->>r2: CO_SIGNIN<br>N1, address1,<br>ref:temp-NS
par
d1->>r2: GET local Directory
and
Note right of r2: stores N1 identity
activate d2
Note left of d2: created with<br>name "N1"
d2-->>r1: connect to address1
d2->>r1: CO_SIGNIN<br>N2, address2<br>your ref:temp-NS
Note right of r1: stores N2 identity
Note left of d1: name changed<br>from "temp-NS"<br>to "N2"
d2->>r1: GET local Directory
end
d2->>r1: Here is my<br>local Directory
Note right of r1: Updates<br>global Directory
d1->>r2: Here is my<br>local Directory
Note right of r2: Updates<br>global Directory
Note over r1,d2: Sign out between two Coordinators
Note right of r1: shall sign out from N2
d1->>r2: CO_SIGNOUT
Note right of r2: removes N1 identity
d2->>-r1: CO_SIGNOUT
Note right of r1: removes N2 identity
deactivate d1
Advantage:
Reason for the reference (ref
):
Alternative, such that the reference is not needed anymore: The Co2 responds "illegally" from its ROUTER to the other DEALER socket, in order to identify the DEALER socket with the namespace.
Here the (initial part):
sequenceDiagram
participant r1 as ROUTER
participant d1 as DEALER
participant r2 as ROUTER
participant d2 as DEALER
Note over r1,d1: N1 Coordinator<br>at address1
Note over r2,d2: N2 Coordinator<br>at address2
Note over r1,d2: Sign in between two Coordinators
Note right of r1: shall connect<br>to address2
activate d1
Note left of d1: created with<br> name "temp-NS"
d1-->>r2: connect to address2
d1->>r2: CO_SIGNIN<br>N1, address1
Note right of r2: stores N1 identity
Note right of r2: Normally illegal<br>response
r2->>d1: ACK: Namespace is N2
Note left of d1: stores N2 as <br> DEALER name
activate d2
Note left of d2: created with<br>name "N1"
d2-->>r1: connect to address1
d2->>r1: CO_SIGNIN<br>N2, address2
Note right of r1: stores N2 identity
Note left of d1: name changed<br>from "temp-NS"<br>to "N2"
deactivate d2
deactivate d1
@bilderbuchi suggested in https://github.com/pymeasure/leco-protocol/pull/38#discussion_r1100593181 :
A Coordinator `Co1` joining a network follows a few steps:
1. It signs in to one Coordinator `Co2` of the Network.
2. If successful, `Co2` sends a list of all Coordinators (and their addresses) that it knows (could be part of sign-in)
3. `Co1` signs in to all Coordinators on this list that it does not know yet (step 1)
4. All Coordinators are connected to all others.
@bklebel suggested in https://github.com/pymeasure/leco-protocol/pull/38#discussion_r1102355317 :
A Coordinator `Co1`joining a network follows a few steps:
1. `Co1` signs in to one Coordinator `Co2` of the Network.
2. After successful a sign in handshake, `Co1` requests a list of all other Coordinators known to `Co2` (`Co3`, `Co4`, ...).
3. `Co1` signs in to all the Coordinators on the list of `Co2`
4. `Co1` requests the lists of the connected Coordinators from all now connected Coordinators (except from `Co2`, it already has this one)
5. `Co1` compares the lists, and notifies Coordinators which have an incomplete set of the missing Coordinators
6. Coordinators with previously incomplete sets do sign ins to the new Coordinators they have been told about, until all Coordinators are connected to all others
In this case, if some Coordinator somehow drops out, those lists come out of sync, at every new sign in of a new Coordinator, the whole system brings itself back into sync. The only new message type would be "hey, you are missing a few more connections". For clarity, I separated the request for the list of connected Coordinators from the initial SIGNIN, although that could be part of the SIGNIN flow.
We would need a separate mapping from Coordinator full name to Coordinator (router) address.
Remember that we have a separate DEALER for every other Coordinator's connected ROUTER socket, yes, we need (in the implementation) to keep track of which DEALER socket belongs to which remote Coordinator's ROUTER address, but we need to do that anyways.
track of which DEALER socket belongs to which remote Coordinator's ROUTER address, but we need to do that anyways.
we do not need (in principle) to keep track of addresses. As long, as I know, which DEALER socket leads to which Namespace, I do not need the address of that Namespace's Coordinator (only for the initial connect).
I see, however, the benefit of your proposals. So (using your both proposals as a base):
Co1
signs in to Co2
of the NetworkCo2
also signs in to Co1
.The advantage of this (note step 4 for both Coordinators), that two Networks may be joined, if a single Coordinator of one Network signs in to one of the other Network. Another advantage: As all other Coordinators update their Coordinator list during sign in, the Network gets "healed" from missing links.
Example:
It works even in another sequence:
If we store the address (in our local directory), the CO_SIGNIN message (which contains Namespace in the sender name and the address) is sufficient to identify the corresponding DEALER socket:
One difficulty: If the address used for connection differs from the address the Coordinator sends via CO_SIGNIN (e.g. full name like "machine.company.com" vs. "machine" vs. "123.03.5.12"), it does not work anymore, without doing name resolution and comparing the IP address (as bytes, not as string, due to zeros).
@bklebel remember, when designing these flows, that they should be as composable as possible, with few decisions/branches, and needing little/no state. One process can trigger another (or itself again). This should result in a cleaner, slimmer list of processes, and there is less state carried around.
In your case, the steps 1., 3., 6. are quite similar - ideally they would just trigger another instance of the same sign-in procedure, and everything would shake out as desired automatically, without many checks for "unknown this", "rest of that", etc. That's what I aimed at with my proposal.
In that vein, I like Benedikt's last cleaned up proposal, which is basically a two-sided version of this: (tweaked the wording to make it clear that the same process just starts again with another set of participants)
@bilderbuchi yes, that was in principle the idea behind it, in trying to make it more clear I muddied it. I like your last proposal here. What was missing in your original proposal (I think) was the part where, in one sign-in process, the Coordinator which starts the conversation also tells the other about their local directory (possibly assuming that the directory of a newly started Coordinator is empty).
@bmoneke I like the "illegal" answer to the DEALER socket. In general, I think it would be a good idea to rather say that a Coordinator may only start a conversation using its DEALER socket, but if there is just a bit of back-and-forth in that one conversation, it should best go across this one channel of DEALER-ROUTER connection. Regarding the implementation, if we handled it otherwise, the Coordinator would now have to filter through incoming messages on the ROUTER socket which are related to this one conversation here, which would be more complicated than having this one DEALER connection here anyways, to which we can answer very simply.
- Both Coordinators sign in to each Coordinator they are not yet signed in with
What this wording misses, is the repetition of steps 3 and 4 unless we include both steps into the definition of "sign in".
Another additional idea: we want to add "Coordinator heartbeats" where they announce the local directory regularly (every fraction of an hour). At reception of that message, the receiving Coordinator shall connect to all unknown Coordinators. We can use the same message after a sign in.
So the following change:
Then the message handling states: Check whether each Coordinator in the local Directory is known. If not, sign in.
I like the "illegal" answer to the DEALER socket.
In fact. The Dealer response is more difficult to handle:
First proposal does not require any additional logic:
Proposal with answer to dealer socket:
I might have a solution: a list of dealer sockets to check, whether a message arrived.
Handle_router()
for sock in open_sockets: # sockets requiring an answer
if sock.poll():
Handle_dealer_message()
If we do not wait for any dealer socket, that for loop ends immediately.
Let's do it via the Dealer!
What was missing in your original proposal (I think) was the part where, in one sign-in process, the Coordinator which starts the conversation also tells the other about their local directory (possibly assuming that the directory of a newly started Coordinator is empty).
Indeed, you're right.
What this wording misses, is the repetition of steps 3 and 4 unless we include both steps into the definition of "sign in".
They are part of the "sign in" -- the list of enumerated steps defines the process (the way I understood it).
I updated my version in the PR according to this discussion.
I have to modify the sign-in procedure, as I run into timing issues (the CO_SIGNIN messages arrives at the router before the Acknowledgment at the DEALER, such that several connections are established...)
sequenceDiagram
participant r1 as ROUTER
participant d1 as DEALER
participant r2 as ROUTER
participant d2 as DEALER
Note over r1,d1: N1 Coordinator<br>at address1
Note over r2,d2: N2 Coordinator<br>at address2
Note over r1,d2: Sign in between two Coordinators
Note right of r1: shall connect<br>to address2
activate d1
Note left of d1: created with<br> name "temp-NS"
d1-->>r2: connect to address2
d1->>r2: V|COORDINATOR|N1.COORDINATOR|H|<br>CO_SIGNIN
Note right of r2: stores N1 identity
r2->>d1: V|N1.COORDINATOR|N2.COORDINATOR|H|ACK
Note left of d1: DEALER name <br>set to "N2"
d1->>r2: V|N1.COORDINATOR|N2.COORDINATOR|H|<br>Here is my local directory
Note right of r2: Updates global <br>Directory and signs <br>in to all unknown<br>Coordinators,<br>also N1
Note over d1,r2: Mirror of above sign in
activate d2
Note left of d2: created with<br>name "N1"
d2-->>r1: connect to address1
d2->>r1: V|COORDINATOR|N2.COORDINATOR|H|<br>CO_SIGNIN
Note right of r1: stores N2 identity
r1->>d2: V|N2.COORDINATOR|N1.COORDINATOR|H|ACK
Note left of d2: Name is already "N1"
d2->>r1: V|N2.COORDINATOR|N1.COORDINATOR|H|<br>Here is my local directory
Note right of r1: Updates global <br>Directory and signs <br>in to all unknown<br>Coordinators
Note over r1,d2: Sign out between two Coordinators
Note right of r1: shall sign out from N2
d1->>r2: CO_SIGNOUT
Note right of r2: removes N1 identity
d2->>-r1: CO_SIGNOUT
Note right of r1: removes N2 identity
deactivate d1
Now we have a hard sequence (no concurrency), again symmetry. We use the directory exchange to connect to the other Coordinator.
We could even use the normal "SIGNIN"/"SIGNOUT" commands (with filtering for the Component name==COORDINATOR)!
For this reason, it is good, having a test environment, as I encountered the problem while updating my implementation according to the PR.
For this reason, it is good, having a test environment, as I encountered the problem while updating my implementation according to the PR.
Agreed, I like it! Ideally, these tests make it into a test suite, so that we can keep checking assumptions etc.
With test environment, I meant an actual implementation (in contrast to production environment). I have a script starting a second Coordinator connecting to a first one, in order to test it quite easily.
However, I'm writing the appropriate tests as well to ensure proper working and catching errors.
This error, however, manifested (due to timing) in the implementation (connecting two Coordinators) and not in unit tests.
I'm using the Coordinators / Actors already in the lab (keeping them in sync with decided upon points of leco).
Done by #38
The procedure to sign in one Coordinator to another is more complex, let us discuss it here.
This discussion has also consequences on the style of the Directory.
Initial situation:
Goal: