neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.27k stars 408 forks source link

commit `Use actual temporary dir for pageserver unit tests` causes test failures #3385

Open problame opened 1 year ago

problame commented 1 year ago

Since this commit, when running a sufficient number of tests, plus some bad luck in eviction order, tests fail.

One symptom is this type error message in the test output:

2023-01-18T19:41:09.456084Z ERROR writeback of buffer EphemeralPage { file_id: 143, blkno: 1492 } failed: failed to write back to ephemeral file at /tmp/.tmpBRiJZx/tenants/a873a4ca1c37bb76d12a5bc8779f7369/timelines/11223344556677881122334455667788/ephemeral-143 error: No such file or directory (os error 2)

The root cause is,presumably, that the TenantHarness's TempDir is dropped, and hence the directory deleted from the filesystem, while the ephemeral file ID is still referenced from the page cache.

I'll post more root cause analysis later.

problame commented 1 year ago

Root cause analysis in PR #3388