real-logic / aeron

Efficient reliable UDP unicast, UDP multicast, and IPC message transport
https://aeron.io
Apache License 2.0
7.43k stars 892 forks source link

Aeron Cluster fails to start when appVersion >= 1.0.0 #1671

Open zyulyaev opened 1 month ago

zyulyaev commented 1 month ago

See test case: https://github.com/real-logic/aeron/commit/34586a2803143343431b6506933330c58508d75b

Currently AppVersionValidator always receives appVersionUnderTest=0 as part of onNewLeadershipTerm. This happens due to ConsensusPublisher never encoding appVersion to the new leadership term message.

Relevant logs:

1 observations from 2024-10-19 13:05:04.419+0100 to 2024-10-19 13:05:04.419+0100 for:
 io.aeron.cluster.client.ClusterException: ERROR - incompatible version: 1.0.0 log=0.0.0
    at io.aeron.cluster.ConsensusModuleAgent.onNewLeadershipTerm(ConsensusModuleAgent.java:960)
    at io.aeron.cluster.ConsensusAdapter.onFragment(ConsensusAdapter.java:143)
    at io.aeron.FragmentAssembler.onFragment(FragmentAssembler.java:118)
    at io.aeron.logbuffer.TermReader.read(TermReader.java:76)
    at io.aeron.Image.poll(Image.java:324)
    at io.aeron.Subscription.poll(Subscription.java:195)
    at io.aeron.cluster.ConsensusAdapter.poll(ConsensusAdapter.java:69)
    at io.aeron.cluster.ConsensusModuleAgent.doWork(ConsensusModuleAgent.java:359)
    at org.agrona.concurrent.AgentRunner.doWork(AgentRunner.java:304)
    at org.agrona.concurrent.AgentRunner.workLoop(AgentRunner.java:296)
    at org.agrona.concurrent.AgentRunner.run(AgentRunner.java:162)
    at java.base/java.lang.Thread.run(Thread.java:1575)

As a side note: it would be very helpful to be able to distinguish whether we are validating a message in the log versus validating a snapshot. In our case we maintain backward compatibility of snapshotting logic, but maintaining backward compatibility of the log processing logic would be too much of a burden. Please let me know if I should create a separate feature request for that.