metal-stack / metal-hammer

metal-hammer is used to boot bare metal servers with ipxe and the metal-stack kernel
GNU Affero General Public License v3.0
41 stars 6 forks source link

machines not entering wait-for-allocation mode #69

Closed mwennrich closed 5 months ago

mwennrich commented 2 years ago

Sometimes, machines emit "waiting for allocation", but are actually not waiting for allocation.

Output:

INFO[12-06|09:32:25] bios                                     message="successfully configured BIOS" caller=bios.go:23
INFO[12-06|09:32:25] event                                    event=Waiting message="waiting for allocation" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 55
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Waiting","message":"waiting for allocation"}

HTTP/1.1 200 OK
Content-Length: 0
Date: Mon, 06 Dec 2021 09:32:25 GMT

DBUG[12-06|09:32:27] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:32] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:37] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
INFO[12-06|09:32:39] event                                    event=Alive message="still alive at: 2021-12-06 09:32:39.587602281 +0000 UTC m=+60.547598084" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 102
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Alive","message":"still alive at: 2021-12-06 09:32:39.587602281 +0000 UTC m=+60.547598084"}

HTTP/1.1 200 OK
Content-Length: 0
Date: Mon, 06 Dec 2021 09:32:39 GMT

DBUG[12-06|09:32:42] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:47] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:52] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:57] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:03] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:08] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:13] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:18] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:23] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:28] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:33] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:38] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
INFO[12-06|09:33:39] event                                    event=Alive message="still alive at: 2021-12-06 09:33:39.587209833 +0000 UTC m=+120.547205638" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 103
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Alive","message":"still alive at: 2021-12-06 09:33:39.587209833 +0000 UTC m=+120.547205638"}

Expected output:

INFO[12-06|09:36:43] bios                                     message="successfully configured BIOS" caller=bios.go:23
INFO[12-06|09:36:43] event                                    event=Waiting message="waiting for allocation" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 55
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Waiting","message":"waiting for allocation"}

HTTP/1.1 200 OK
Content-Length: 0
Date: Mon, 06 Dec 2021 09:36:43 GMT

DBUG[12-06|09:36:44] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
INFO[12-06|09:36:48] wait for allocation...                   machineID=33cf1200-0e3b-11eb-8000-3cecef47709a caller=wait.go:57
DBUG[12-06|09:36:49] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
INFO[12-06|09:36:53] wait for allocation...                   machineID=33cf1200-0e3b-11eb-8000-3cecef47709a caller=wait.go:57
DBUG[12-06|09:36:55] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
INFO[12-06|09:36:56] event                                    event=Alive message="still alive at: 2021-12-06 09:36:56.753981147 +0000 UTC m=+60.559026527" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 102
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Alive","message":"still alive at: 2021-12-06 09:36:56.753981147 +0000 UTC m=+60.559026527"}

HTTP/1.1 200 OK
Content-Length: 0
Date: Mon, 06 Dec 2021 09:36:56 GMT

INFO[12-06|09:36:58] wait for allocation...                   machineID=33cf1200-0e3b-11eb-8000-3cecef47709a caller=wait.go:57
DBUG[12-06|09:37:00] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
majst01 commented 2 years ago

Cant see the difference in these outputs.

mwennrich commented 2 years ago

failed one is missing following lines: INFO[12-06|09:36:48] wait for allocation... machineID=33cf1200-0e3b-11eb-8000-3cecef47709a caller=wait.go:57

majst01 commented 2 years ago

Most probably an issue with metal-api https://github.com/metal-stack/metal-api/issues/243

majst01 commented 1 year ago

Can this be closed as never happened again ?

majst01 commented 5 months ago

nothing new here, so closing