metal-stack / metal-hammer

metal-hammer is used to boot bare metal servers with ipxe and the metal-stack kernel
GNU Affero General Public License v3.0
42 stars 6 forks source link

Reboot on network errors #101

Closed majst01 closed 7 months ago

majst01 commented 1 year ago

closes #100

maybe we should also add a dhclient impl into metal-hammer because kernel dhcp does not retry on errors:

https://github.com/u-root/u-root/blob/main/cmds/boot/pxeboot/pxeboot.go#L77

Current Errors

2023-01-31T07:58:04.498Z        error   failed waiting for allocation   {"retry after": 2, "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup metal.metalstack.cloud on 1.1.1.1:53: read udp 10.255.253.201:60811->1.1.1.1:53: i/o timeout\""}

2023-01-31T07:58:10.633Z        error   event   {"cannot send event": "Alive", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup metal.metalstack.cloud on 1.1.1.1:53: read udp 10.255.253.201:60811->1.1.1.1:53: i/o timeout\""}
Gerrit91 commented 1 year ago

Needs rebase

majst01 commented 7 months ago

Wont be taken,