This test could occasionally trigger a "removing local file ... because it has unexpected length log" when using the compact-shard-ancestors-persistent failpoint is in use, which is unexpected because that failpoint stops the process when the remote metadata is in sync with local files.
It was because there are two shards on the same pageserver, and while the one being compacted explicitly stops at the failpoint, another shard was compacting in the background and failing at an unclean point. The test intends to disable background compaction, but was mistakenly revoking the value of compaction_period when it updated pitr_interval.
Update TENANT_CONF in the test to use properly typed values, so that it is usable in pageserver APIs as well as via neon_local.
When updating tenant config with pitr_interval, retain the overrides from the start of the test, so that there won't be any background compaction going on during the test.
Checklist before requesting a review
[ ] I have performed a self-review of my code.
[ ] If it is a core feature, I have added thorough tests.
[ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
[ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.
Checklist before merging
[ ] Do not forget to reformat commit message to not include the above checklist
Problem
This test could occasionally trigger a "removing local file ... because it has unexpected length log" when using the
compact-shard-ancestors-persistent
failpoint is in use, which is unexpected because that failpoint stops the process when the remote metadata is in sync with local files.It was because there are two shards on the same pageserver, and while the one being compacted explicitly stops at the failpoint, another shard was compacting in the background and failing at an unclean point. The test intends to disable background compaction, but was mistakenly revoking the value of
compaction_period
when it updatedpitr_interval
.Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8123/9602976462/index.html#/testresult/7dd6165da7daef40
Summary of changes
TENANT_CONF
in the test to use properly typed values, so that it is usable in pageserver APIs as well as via neon_local.pitr_interval
, retain the overrides from the start of the test, so that there won't be any background compaction going on during the test.Checklist before requesting a review
Checklist before merging