Open wenym1 opened 3 weeks ago
cc @hzxa21 @zwang28
LGTM for the proposal in general!
To make things easier, we can still support barrier read, but the batch query of barrier read won't carry any epoch information anymore. The barrier read batch query always reads the latest uncommitted data of each table, and the consistency is ignored.
+1. Scarifying consistency for simplicity in the context of read uncommitted query sounds reasonable to me. cc @fuyufjh
Proposal
Generalize time travel query for all batch queries, which means that all batch query will be handled as time travel query.
In a single
HummockVersion
, we only provide a single view at the committed epoch rather than views at all epochs betweensafe_epoch
andcommitted_epoch
, and as a result, we can then deprecatesafe_epoch
.Moreover, we need to deprecate support on barrier read on uncommitted epoch with consistency.
Motivation
Currently, we have
safe_epoch
inHummockVersion
to specify that, in thisHummockVersion
, we are safe to make a query on any epoch above thissafe_epoch
. In other word, we support querying multiple versions of data under different epochs providing a singleHummockVersion
. The reason for this feature is that, in each CN, we only have a single latestHummockVersion
(ignored those versions pinned at created iterators), but in frontend, each session will pin an epoch (PinnedSnapshot), and we want to serve the query from different pinned epoch with this single latestHummockVersion
.This design makes the communication between frontend and CN elegant, but comes with price on the other hands:
After we support time-travel in batch query, to support queries on different epochs, we don't have to rely on a single hummock version, and instead, we can rebuild a hummock version for a specific epoch. Therefore, we can generalize time travel query for all batch queries, which means for all batch queries, we will first figure out a hummock version for the provided epoch, either from the latest version, or rebuild a new version, and then read data the version, and then each hummock version does not need to store multiple versions of a key anymore, and the
safe_epoch
can be deprecated.Besides, we need to deprecate support on barrier read on uncommitted epoch with consistency. Currently, for uncommitted barrier read, we pin an uncommitted non-checkpoint current epoch and use this epoch in batch query. However, since this pinned epoch is non-checkpoint epoch, after this checkpoint epoch gets committed, the pinned non-checkpoint epoch will be below the committed epoch, and to support consistent query on this epoch, the committed version will still have to maintain values of multiple versions between the committed epoch and the previous checkpoint epoch. To make things easier, we can still support barrier read, but the batch query of barrier read won't carry any epoch information anymore. The barrier read batch query always reads the latest uncommitted data of each table, and the consistency is ignored.
Tracking
HummockReadEpoch::TimeTravel
in batch query, and refine the read logic accordingly18459
safe_epoch
and do not persistPinnedSnapshot