Previously we couldn't have the x25519 map in the state_t because the X25519 pubkey is derived from the service node's Ed25519, which we didn't learn about until we got an uptime proof with it. Thus our X25519 pubkey list was never quite perfect in the case of missing or delayed proofs, and could sometimes lead to some nodes (temporarily) being unable to communicate with each other until a proof goes out to communicate the Ed25519 (and thus implicitly the X25519) pubkey.
In HF21 we now require unified pubkeys, which means the main SN is always the Ed25519 pubkey, and so we can can always compute the X25519 pubkeys perfectly.
This implements it by adding a x25519 pubkey map in the state_t and updating it on initialization, and when service nodes are added/removed so that we always know the full set of x25519 pubkeys without having to wait for proofs. (We don't need to serialize this -- it can be constructed on the fly from the main SN pubkey, as long as we're on HF21+).
Doing this resolves some quorumnet communication failures (such as in pulse participation) that can happen when service nodes try working with other service nodes but the proof of the connecting SN hasn't propagated yet (or was missed).
This also changes the OMQ SN list code to include all SNs instead of only active SNs with proofs: decommed SNs can still sometimes try to communicate over oxenmq with other nodes and there's no reason they need to be excluded from such comms. (Pre-HF21 we still can only include nodes with proofs, of course, because without a proof we don't know the X25519 pubkey that will be used).
Also included here is a fix to the set_log command (which was broken) and related RPC methods, which annoyed me in trying to diagnose this; and another SN state change log statement that should be in the global category.
Previously we couldn't have the x25519 map in the state_t because the X25519 pubkey is derived from the service node's Ed25519, which we didn't learn about until we got an uptime proof with it. Thus our X25519 pubkey list was never quite perfect in the case of missing or delayed proofs, and could sometimes lead to some nodes (temporarily) being unable to communicate with each other until a proof goes out to communicate the Ed25519 (and thus implicitly the X25519) pubkey.
In HF21 we now require unified pubkeys, which means the main SN is always the Ed25519 pubkey, and so we can can always compute the X25519 pubkeys perfectly.
This implements it by adding a x25519 pubkey map in the state_t and updating it on initialization, and when service nodes are added/removed so that we always know the full set of x25519 pubkeys without having to wait for proofs. (We don't need to serialize this -- it can be constructed on the fly from the main SN pubkey, as long as we're on HF21+).
Doing this resolves some quorumnet communication failures (such as in pulse participation) that can happen when service nodes try working with other service nodes but the proof of the connecting SN hasn't propagated yet (or was missed).
This also changes the OMQ SN list code to include all SNs instead of only active SNs with proofs: decommed SNs can still sometimes try to communicate over oxenmq with other nodes and there's no reason they need to be excluded from such comms. (Pre-HF21 we still can only include nodes with proofs, of course, because without a proof we don't know the X25519 pubkey that will be used).
Also included here is a fix to the
set_log
command (which was broken) and related RPC methods, which annoyed me in trying to diagnose this; and another SN state change log statement that should be in the global category.