tinkerbell / smee

DHCP and iPXE Server
https://tinkerbell.org
Apache License 2.0
259 stars 80 forks source link

Smee panic in proxy mode #482

Closed waldner closed 1 month ago

waldner commented 1 month ago

When running smee in proxy mode, the second PXE boot crashes it.

Current Behaviour

Running smee with -dhcp-mode=proxy. When the client first boots, it boots fine, loads hooks and runs through the provisioning template. After this, I set allowPXE: false and allowWorkflow: false in the corresponding hardware object. I then reboot the client (which is still configured to do PXE boot), and the PXE request crashes smee:

{"level":"info","ts":1721220400.4172351,"caller":"smee/main.go:124","msg":"starting","version":"02731c4"}
{"level":"info","ts":1721220400.4172654,"caller":"smee/main.go:129","msg":"starting syslog server","bind_addr":"0.0.0.0:514"}
{"level":"info","ts":1721220400.4172966,"caller":"smee/main.go:158","msg":"starting tftp server","bind_addr":"0.0.0.0:69"}
{"level":"info","ts":1721220400.4176638,"logger":"github.com/tinkerbell/ipxedust","caller":"ipxedust@v0.0.0-20231215220341-a535c5deb47a/ipxedust.go:201","msg":"serving iPXE binaries via TFTP","service":"github.com/tinkerbell/smee","addr":"0.0.0.0:69","blocksize":512,"timeout":5,"singlePortEnabled":true}
{"level":"info","ts":1721220400.422946,"caller":"smee/main.go:220","msg":"serving http","addr":"0.0.0.0:7171","trusted_proxies":["10.1.0.0/16","10.0.1.0/16"]}
{"level":"info","ts":1721220400.4255617,"caller":"smee/main.go:233","msg":"starting dhcp server","bind_addr":"0.0.0.0:67"}
{"level":"info","ts":1721220400.4256585,"caller":"server/dhcp.go:35","msg":"Server listening on","addr":"0.0.0.0:67"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x158d2f8]

goroutine 91 [running]:
github.com/tinkerbell/smee/internal/dhcp/handler/proxy.(*Handler).Handle(0xc00050cb40, {0x1c1b440, 0xc00017e690}, 0xc000564190, {{0x1c074b0?, 0xc000630e10?}, 0xc00080c370?, 0xc00061c630?})
    /home/runner/work/smee/smee/internal/dhcp/handler/proxy/proxy.go:190 +0x1338
created by github.com/tinkerbell/smee/internal/dhcp/server.(*DHCP).Serve in goroutine 51
    /home/runner/work/smee/smee/internal/dhcp/server/dhcp.go:86 +0x6ef

Your Environment

Running smee in kubernetes (microk8s) using the tinkerbell helm chart 0.4.4, smee image is version v0.11.0. This is consistently reproducible.

smee:
  additionalArgs: ["-dhcp-mode=proxy"]
  trustedProxies: ["10.1.0.0/16","10.0.1.0/16"]
  logLevel: debug
  hostNetwork: true
  publicIP: 10.110.0.12

EDIT: This does not happen in DHCP "normal" (ie, reservation) mode. When the host with allowPXE: false reboots, smee does not crash and (from what I can see) serves the netboot-not-allowed file. This causes a PXE boot error on the client, which then proceeds to boot from hard drive. A bit rough possibly, but it works, so I'd at least expect the same behavior upon second boot when in proxy mode.

{"level":"info","ts":1721223489.1344376,"caller":"reservation/handler.go:141","msg":"sent DHCP response","mac":"58:11:22:32:83:0d","xid":"0xb9bded33","interface":"calic5d7ed3f76a","type":"ACK","bootFileName":"/netboot-not-allowed","nextServer":"0.0.0.0","ipAddress":"10.112.0.65","destination":"10.1.30.15:67"}
{"level":"error","ts":1721223489.1371915,"logger":"github.com/tinkerbell/ipxedust","caller":"itftp/itftp.go:91","msg":"file unknown","service":"github.com/tinkerbell/smee","event":"get","filename":"netboot-not-allowed","uri":"/netboot-not-allowed","client":{"IP":"10.1.30.15","Port":37262,"Zone":""},"macFromURI":"","error":"file [netboot-not-allowed] unknown: file does not exist","stacktrace":"github.com/tinkerbell/ipxedust/itftp.Handler.HandleRead\n\t/home/runner/go/pkg/mod/github.com/tinkerbell/ipxedust@v0.0.0-20231215220341-a535c5deb47a/itftp/itftp.go:91\ngithub.com/pin/tftp/v3.(*Server).handlePacket.func2\n\t/home/runner/go/pkg/mod/github.com/pin/tftp/v3@v3.1.0/server.go:455"}
jacobweinstock commented 1 month ago

Hey @waldner, thanks for reporting this! I have opened #483 to resolve this.

waldner commented 1 month ago

Thanks! Will this be included in a new smee image (eg v0.12.0)? And when will it be published?

jacobweinstock commented 1 month ago

Yes, it will be in v0.12.0. I'm hoping to get that out by end of next week. Also, it is available now using quay.io/tinkerbell/smee:sha-47170cde and quay.io/tinkerbell/smee:latest

waldner commented 1 month ago

Trying to use the new image (smee:sha-47170cde) with a 0.4.4 helm chart, I'm getting this error:

$ kubectl logs -n tink-system smee-69d7dcddc6-zqjmc
flag provided but not defined: -dhcp-http-ipxe-binary-url
Smee is the DHCP and Network boot service for use in the Tinkerbell stack.

USAGE
  smee [flags]
...
jacobweinstock commented 1 month ago

Trying to use the new image (smee:sha-47170cde) with a 0.4.4 helm chart, I'm getting this error:

$ kubectl logs -n tink-system smee-69d7dcddc6-zqjmc
flag provided but not defined: -dhcp-http-ipxe-binary-url
Smee is the DHCP and Network boot service for use in the Tinkerbell stack.

USAGE
  smee [flags]
...

Yeah, the top of tree for Smee has cli flag changes. Here's the Helm chart updates that are needed: https://github.com/tinkerbell/charts/pull/111 These will land in the Charts repo once Smee is released. You can also see the different cli flags by running: docker run -it --rm quay.io/tinkerbell/smee:sha-47170cde -h

waldner commented 1 month ago

Thanks again.