Open Lazin opened 1 year ago
This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale
label – otherwise this will be closed in two weeks.
@Lazin I presume this is still valid/needed?
@piyushredpanda this is still needed and it's part of the ntp-archiver revamp.
Version & Environment
Redpanda version: (use
rpk version
): devWhat went wrong?
We expect timequery to take into account only user data. But in Redpanda the configuration is also stored in the same log. These configuration batches also have timestamps. Because of that when the full segment or segment region is uploaded it may have the config batch in the beginning or at the end and the segment metadata will use its timestamp for segment metadata. The segment metadata is used to run timequeries in the tiered storage. And if this is the case we will use our internal timestamps instead of using only the timestamps provided by the user.
If the timestamps provided by the user are skewed a lot we may provide erroneous results in this case. In v23.1 we would start searching the batch from the beginning of the log. In v23.2 if spillover is enabled we may not find the batch because tiered-storage reads can't go across the manifest boundary.
The cases when we upload without taking batch type into account:
The read path handles the situation by generating indexes that take this situation into account. But timequery uses metadata from the manifest as well and our internal timestamps may leak into the manifest.
What should have happened instead?
We should only use timestamps provided by the user to answer queries.
How to reproduce the issue?
Additional information
n/a
JIRA Link: CORE-1422