Leader election times seem too long with max current term

huanghj78 commented 8 months ago

I use a fault injection tool to modify the term field of AppendEntriesRequest to max uint64. When the current term is 18446744073709551615 (max uint64) , restart current leader node to start a new election. It takes a long time to select a new leader(about 1min). The testing system is rqlite. Some crucial logs can be find in https://github.com/rqlite/rqlite/issues/1712#issuecomment-1975546515

banks commented 8 months ago

Both Term and Index in raft can never go backwards and must proceed forwards for the cluster to make progress.

We use uint64s to make them "effectively infinite". but it's totally expected that if either ever does approach the max uint that the library will stop functioning.

Honestly I'm a bit surprised you could get a new leader elected at all, but even if you could I would not trust this library to be correct since it means the term would have wrapped around to 0 which violates all Raft's design.

So I'd say what you are testing is just outside the design envelope that we can reasonably support.

In practice, a cluster would have to hold a new elections every 10ms for nearly 6 billion years for this to be a real issue so it doesn't seem worth worrying about or trying to test to me.

Even Index which increases many orders of magnitude quicker in most clusters would require 100k writes per second (10x more than current max known users of this library) for nearly 6 million years to come close...

If I missed some reason you think this is significant please let me know but I'd lean towards not spending time solving anything related to uint64 integer limits for now!

banks commented 8 months ago

Extra note: I realise you found this while doing fault injection testing (which is great!). I'm trying to understand the fault model you are aiming for though.

If your concern is that some packet-level corruption makes the Term or Index much higher then I think that is best solved at the transport or storage layer with checksums etc. If your concern is a malicious actor intentionally setting a high term then that's outside the scope of Raft - Raft is a non-byzantine consensus algorithm that inherently has to trust the other nodes are correct. TLS or similar cryptographic transport can ensure we don't have to trust the whole network, but the whole Raft algorithm itself fails if you assume that nodes or actors may deliberately deviate from the specified algorithm during operation (i.e. arbitrarily increasing their advertised term). If you need Byzantine Fault Tolerance then you need something very different to Raft (think more like a blockchain).

otoolep commented 8 months ago

I agree with @banks - this seems to be a case of pushing Raft outside of its stated operating parameters.

huanghj78 commented 8 months ago

@banks Thank you for your reply. I agree with your explanation. My purpose in conducting fault injection is to test the robustness of systems using Raft in some fault scenarios. In fact, the presence of malicious nodes was not considered, as this is indeed not a problem that Raft can solve. The reason why I modified the term value is to simulate the situation where other nodes recover from the network partition and join the cluster.

Honestly I'm a bit surprised you could get a new leader elected at all, but even if you could I would not trust this library to be correct since it means the term would have wrapped around to 0 which violates all Raft's design.

As you mentioned, can this issues be considered as a result of not taking into account this edge case. I inject the same fault into ETCD, the program will panic and exit. (Does this seem more in line with expectations?)

banks commented 8 months ago

@huanghj78 Yes I think panic would be a better option here. I'm not sure it's worth spending limited resources on handling a fault that will almost certainly never happen outside of simulation though! I do like the approach of fault injection in general so I'd be open to a PR to fix this provided it was very simple and low risk of introducing unwanted behaviour.

But that said, it seems to me that fault-injection testing is most valuable when steered towards faults that are within the boundaries of possibility/intended design for a real system!

I'll leave this open for a short time in case you'd like to contribute a fix as part of your research, but I'm not sure it's important enough to keep open long term as there are many other issues that have more realistic consequences where I'd rather we spent time!

Thanks again for your great report and for testing things thoroughly though - it's very much appreciated regardless of the outcome here!

hashicorp / raft

Leader election times seem too long with max current term #586