Closed tomyouyou closed 2 years ago
Related to #256, #268.
@tomyouyou this does not add any new tests. Do you think it would be realistic to add some?
@kjnilsson should be back next week to review this.
Ignore the OCI image publishing failure, this repo hasn't been updated to only attempt publish when Actions has access to the credentials used (which depends on who submits the PR).
Ok I've reviewed this change and I think it is good. Thank you @tomyouyou
Writing a test for it is possible but convoluted. We use meck
to fake updated module versions and AFAIK meck is node global so we'd have to use peer (slave) nodes to test a scenario where a snapshot for a lower version is taken by a member with a higher version then restarting.
To reproduce the issue:
Build a 3-node cluster with rabbitmq-server-3.9.14-1.el7.noarch.rpm
Create a quorum queue 'sq12' and its machine version is v1.
Upgrade one of the nodes with rabbitmq-server-3.10.0-1.el8.noarch.rpm, assuming that the node is node-new-ver. The leader of 'sq12' is an old version node.
An segment file of 'sq12' was created in the node-new-ver.
Publish a mesage to 'sq12'. Create a consumer to receive and ack the message. In this way, a snapshot will be generated due to the segment file on the node-new-ver. The machine state version of the snapshot is v1, However, the 'machine_version' inside the snapshot meta is v2 instead of v1. Therefore, recovering from this illegal snapshot will cause an exception.
Restart the new version node. When recovering the queue from the snapshot, an exception is happened.