streamnative / bookkeeper-achieved

Apache Bookkeeper
https://bookkeeper.apache.org
Apache License 2.0
3 stars 2 forks source link

ISSUE-2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch #408

Closed sijie closed 3 years ago

sijie commented 3 years ago

Original Issue: apache/bookkeeper#2795


BUG REPORT

Describe the bug

We have tested upgrading Bookkeeper (in Kubernetes) from 4.11 to 4.14. Concretely, in the new version of Bookkeeper, we wanted to make use of the Bookie ID functionality to assign arbitrary identifiers to Bookies, different to the network-related info.

During the upgrade process, we found a cookie mismatch issue as the following one:

2021-09-14 18:10:44,499 - ERROR - [main:Main@228] - Failed to build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4
bookieHost: "bookkeeper-bookie-4-10107"
journalDir: "/bk/journal/j0,/bk/journal/j1,/bk/journal/j2,/bk/journal/j3"
ledgerDirs: "4\t/bk/ledgers/l0\t/bk/ledgers/l1\t/bk/ledgers/l2\t/bk/ledgers/l3"
instanceId: "8666dd7f-38ad-4815-8b72-1e181b89a0ab"
] is not matching with [4
bookieHost: "bookkeeper-bookie-4.bookkeeper-bookie-headless.default.svc.cluster.local:3181"
journalDir: "/bk/journal/j0,/bk/journal/j1,/bk/journal/j2,/bk/journal/j3"
ledgerDirs: "4\t/bk/ledgers/l0\t/bk/ledgers/l1\t/bk/ledgers/l2\t/bk/ledgers/l3"
instanceId: "8666dd7f-38ad-4815-8b72-1e181b89a0ab"
]
        at org.apache.bookkeeper.bookie.Cookie.verifyInternal(Cookie.java:136)
        at org.apache.bookkeeper.bookie.Cookie.verify(Cookie.java:147)
        at org.apache.bookkeeper.bookie.Bookie.readAndVerifyCookieFromRegistrationManager(Bookie.java:374)
        at org.apache.bookkeeper.bookie.Bookie.checkEnvironmentWithStorageExpansion(Bookie.java:446)
        at org.apache.bookkeeper.bookie.Bookie.checkEnvironment(Bookie.java:273)
        at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:731)
        at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:152)
        at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:120)
        at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:52)
        at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:304)
        at org.apache.bookkeeper.server.Main.doMain(Main.java:226)
        at org.apache.bookkeeper.server.Main.main(Main.java:208)

That is, in the first incarnation of the Bookie with version 4.11, we observe that its ID was the hostname (bookkeeper-bookie-4.bookkeeper-bookie-headless.default.svc.cluster.local:3181), which makes sense as we use useHostNameAsBookieID option set to true. However, when the Bookie is upgraded to version 4.14 and explicitly given a new Bookie ID (bookkeeper-bookie-4-10107), it seems that it still gets the cookie from Zookeeper related to its hostname. Therefore, it compares the cookie in Zookeeper with the local one and there is a mismatch, which is understandable because the bookieHost fields differs between both cookies.

In summary, this is the problem: when we provide a new Bookie ID to a Bookie using the functionality of BP-41, the Bookie still tries to look for cookies related to its network information. This looks like a wrong behavior, because the Bookie should either use the Bookie ID if provided or the network information to lookup and existing cookie, but not both.

To validate the previous hypothesis, we have done a small change in the cookie validation process as follows: https://github.com/apache/bookkeeper/compare/master...RaulGracia:issue-cookie-check?expand=1

With that change, the upgrade works as expected and the new Bookie incarnation uses the given Bookie ID as identifier to look for cookies. The question is: do you foresee any side effect of this change?

To Reproduce Upgrade an existing Bookie from a version prior 4.12 to a version after 4.12 providing an arbitrary Bookie ID (assuming the Bookie is in the same host).

Expected behavior

A user-defined Bookie ID should prevail over network-related info when looking for cookies, and both should be exclusive.

Screenshots

n/a

Additional context

n/a