Closed holtgrewe closed 3 years ago
The problem might be in the postgres configuration after all. Sent email to list:
https://www.postgresql.org/message-id/dd6486da33414ed48a9a314441716f3c%40bih-charite.de
The helpful people at the postgres mailing list pointed me into the right direction. For some reason, the tarball contains an unlogged table while our internal deployment is not unlogged.
Now we have the main cause and can fix it. We also need to look into why we have diverging deployment.
Describe the bug When postgres crashes (e.g., out of memory), it tries to recover from the write ahead log files (WAL) that are part of the prebuilt database tarball that we provide. Somehow these files appear to be corrupted when we built them. It's not a problem with Varfish oder postgres but with our data tarball.
To Reproduce Steps to reproduce the behavior:
Expected behavior With correct WAL files, postgres should recover correctly.
Additional context We've seen this in two sites already. It's not clear to me yet if it would occur twice with the same data download. One site re-downloaded the data tarball and saw it a second time.