neondatabase / autoscaling

Postgres vertical autoscaling in k8s
Apache License 2.0
144 stars 18 forks source link

Bug: compute_ctl can't open neon.tech.log.0 after restart #939

Closed Omrigan closed 1 month ago

Omrigan commented 2 months ago

The following error message is observed: cannot create /dev/virtio-ports/neon.tech.log.0: Device or resource busy. This has happened together with memory pressure, and OOM-kill of compute_ctl.

The leading hypothesis on this is: while compute_ctl got killed, postgres kept working, thus a descriptor for neon.tech.log.0 remained open (postgres still writes to it). Once compute_ctl was restarted, it tried to open neon.tech.log.0, but it is impossible to open a serial interface more than once.

Environment

Production

Steps to reproduce

Kill compute_ctl, while leaving postgres working.

Possible solutions

  1. Have a layer between logs source and serial device, e.g. socat, which will multiplex output into the serial device.
  2. 578 could be helpful.

Other logs, links

Thread: https://neondb.slack.com/archives/C03TN5G758R/p1716165317982799