river-build / river

MIT License
23 stars 6 forks source link

Remove supervisord from node docker (it seems this sigterm os from supervisord) #978

Closed jterzis closed 2 months ago

jterzis commented 2 months ago

Describe the bug A node operator experienced a bug earlier when upgrading mainnet to version d3ae9cf during vacuum migration step.

To Reproduce Steps to reproduce the behavior:

  1. Upgrade stream node to version d3ae9cf.
  2. Observe node stalling on vacuum step, issuing SIGTERM ultimately.

To remedy:

  1. Restart node

Expected behavior Node startup should run migration including vacuum and successfully start service thereafter without stalling or issuing a SIGKILL.

Screenshots 2024-09-05 16 20 01

Desktop (please complete the following information): DB Postgres 14.12 Mem: 30gb Storage: 500 GB ssd vCPUS: 8

Logs

Additional context Add any other context about the problem here.

sergekh2 commented 2 months ago

Report:

[9/17/24 05:41]
Seeing this in their logs:
Error: GetMiniblocks: pg.txRunner: pg.compareUUID: (8:RESOURCE_EXHAUSTED) No longer a current node, shutting down | transaction failed
    currentUUID = uc6SRAUgrQyO
    schema = s0x553859aac181c5568dac867410698c586f3cf1d3
    newUUIDs = []
    name = ReadMiniblocks
    streamId = a1bb8b0aeb1345af046b91117189252456242eb0350000000000000000000000
    req.Msg.StreamId = a1bb8b0aeb1345af046b91117189252456242eb0350000000000000000000000
2024-09-17 08:11:45,567 WARN exited: river_node (exit status 1; not expected)

[9/17/24 05:43]
So looks like something in the container stopped, but the service did not exit. If the docker container would have exited, then systemd would have performed a restart.