Closed sgeisler closed 5 years ago
I need to think about the problem more. The part that I understood could be done at the networking layer as I wrote in #365.
Did you try out whether this change would actually work, even after some nodes joined and left? My concern is this:
The epoch exposed to the user is the sum of the internal era
and epoch
. Every time the set of validators changes, the internal HoneyBadger
instance is restarted at epoch
0
, and the era
is set to the current epoch + 1
. Rejoining will only work if the new node's era
and epoch
match the network's. It doesn't suffice if the (user-visible) sum matches.
Regarding the tests, our plans in the long term are to replace tests/network
with tests/net
, so it would be better to use VirtualNet
.
The epoch exposed to the user is the sum of the internal era and epoch
Oh, I didn't notice that, I always thought of them as two different things (era for key sets, epoch for rounds in an era). Actually my use case stays at era 0 forever, so I never tried to get the current era :smile: I just wanted to use the queuing HB. Since that's probably not the standard use case there should be two "getters" imo: one for the era and one for the epoch.
If that's too much of a change for you I'm ok with just forking and maintaining that diff myself and maybe try to build my own queuing HB on top of HB instead of dynamic HB later.
I can't see at the moment how manually setting the epoch on construction could result in an invalid state (you may end up with a confused honey badger, but not an invalid one). So I guess this is okay?
When rejoining after a connection loss, would it not be prudent to keep the old algorithm instance around? After all, it's built to be asynchronous; as @vkomenda said, it sounds like a networking problem (the honey badger would ideally sort itself out).
Crashing the program is a different matter however, so there's definitely merit in recovering from that by storing state and picking up where one left off.
I'll leave it up @afck and @vkomenda to judge if these changes make sense, purely from an implementation standpoint they look good to me =).
@sgeisler: We recently (just now!) changed the API regarding the passing of random number generators, those lines are in close proximity to your changes, so you will have to rebase. Sorry.
Also, the TestNetwork
is, as @afck pointed out, being phased out in favor of VirtualNet
. New-style tests use proptest and the net
module; see tests/README.md
for a detailed intro of how the new framework works. It's also fairly well documented (I hope).
@mbr: Yes, in case of a short-term connection loss, the Honey Badger should just keep going! The application logic must make sure that all messages are delivered after reconnecting. SenderQueue
does that automatically.
After a crash or a long disconnect, however, this makes sense to me.
I'm fine with merging this, if @vkomenda is.
Implementing an epoch setter for the
DynamicHoneyBadgerBuilder
enables the creation of aDynamicHoneyBadger
that will join the consensus at a given epoch (fixes #365 ).I'm not yet familiar enough with your code base to know what testing would be expected for that change and where to put the tests. You seem to mainly rely on integration tests, so I guess I could patch
TestNetwork
to support late joiners and then add a test with one?TODO: