linkedin / kafka-remote-storage-azure

BSD 2-Clause "Simplified" License
13 stars 2 forks source link

RemoteLogSegmentMetadata extensions #1

Open sutambe opened 2 years ago

sutambe commented 2 years ago

The following extensions to the RemoteLogSegmentMetadata are deemed useful.

  1. A sufficiently large-scale Kafka cluster with tiering enabled may have to use multiple Azure Blob Storage accounts for availability and throughout reasons. The storage account chosen to write a remote log segment may be dynamic and not necessarily a statically known scheme. The number of Azure Blob storage accounts in use may change over time. The RSM plugin is the best place to pick the storage account and the RemoteLogSegmentMetadata should include the account id so that after leadership transfer, the subsequent leader can use the correct Blob Storage account to manage the remote log segments. Including an arbitrary byte[] for user data in RemoteLogSegmentMetadata is useful in this case.

  2. A single kafka cluster may operate topics with and without transactions enabled. Transaction index per segment is, therefore, optional. The RemoteLogSegmentMetadata should include a flag whether that segment has a transaction index or not. This will save a remote log segment metadata lookup roundtrip. However, a custom mapping a local log segment and index files to an blob with a custom layout to aggregate multiple files, such a flag may be redundant.