Got an interesting case with one of the production read-only endpoints. Walreceiver errored out and died:
2024-06-18 14:09:16.961 {"app":"NeonVM","endpoint_id":"ep-winter-rice-59233042","pod":"compute-lingering-forest-a2yogi5o-6kzkf","_entry":"PG:2024-06-18 14:09:16.598 GMT ttid=a27b300c2ff46c602a1635ab92d236f3/03baf9167a86378faa6375d8273d0f6d sqlstate=53100 [493] FATAL: could not write to file \"pg_wal/xlogtemp.493\": No space left on device"}
2024-06-18 08:22:29.288 {"app":"NeonVM","endpoint_id":"ep-winter-rice-59233042","pod":"compute-lingering-forest-a2yogi5o-6kzkf","_entry":"PG:2024-06-18 08:22:29.127 GMT ttid=a27b300c2ff46c602a1635ab92d236f3/03baf9167a86378faa6375d8273d0f6d sqlstate=00000 [493] LOG: skipping missing configuration file \"/var/db/postgres/compute/pgdata/compute_ctl_temp_override.conf\""}
2024-06-18 08:22:24.446 {"app":"NeonVM","pod":"compute-lingering-forest-a2yogi5o-6kzkf","_entry":"PG:2024-06-18 08:22:24.347 GMT ttid=a27b300c2ff46c602a1635ab92d236f3/03baf9167a86378faa6375d8273d0f6d sqlstate=00000 [493] LOG: started streaming WAL from primary at 3/49000000 on timeline 1"}
Got an interesting case with one of the production read-only endpoints. Walreceiver errored out and died:
but then it did not start again.
https://neondb.slack.com/archives/C04DGM6SMTM/p1719394592373479 https://console.neon.tech/admin/regions/aws-eu-central-1/computes/compute-lingering-forest-a2yogi5o
Heikki suggested to try to manually reproduce by adding
elog(FATAL, "crashme")
in walsender.