tinkerbell / osie

An in-memory installation environment for bare metal.
https://tinkerbell.org
Apache License 2.0
100 stars 30 forks source link

osie-runner triggering SIGSEGV and being terminated #245

Closed nshalman closed 3 years ago

nshalman commented 3 years ago

We've seen some machines where preinstallation has completed and osie-runner is sitting connected to hegel over grpc where it suddenly terminates with exit code 139:

localhost:~# docker ps -a
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS                     PORTS               NAMES
bcc4cb56c565        osie-runner:x86_64   "/entrypoint.sh pyth…"   5 days ago          Exited (139) 7 hours ago                       quizzical_varahamihira
localhost:~# docker logs quizzical_varahamihira --tail 3
}
2021-08-04T10:40:45.952178Z [info     ] no handler for state           [runner] state=provisionable
2021-08-04T10:40:45.952249Z [info     ] about to monitor               [runner]

Expected Behaviour

osie-runner keeps running indefinitely until it receives new information from hegel

Current Behaviour

osie-runner exits unexpectedly with exit code 139

Possible Solution

Update to the latest grpc in hopes that some bug in the core library has been fixed.

Steps to Reproduce (for bugs)

1. 2. 3. 4.

Context

Your Environment

Equinix Metal Production

nshalman commented 3 years ago

Closing as hopefully addressed by #246. Will re-open if we see it again.