metal-stack / metal-hammer

metal-hammer is used to boot bare metal servers with ipxe and the metal-stack kernel
GNU Affero General Public License v3.0
41 stars 6 forks source link

Prevent metal-hammer blocking long when Loki is unavailable #132

Closed majst01 closed 1 month ago

majst01 commented 2 months ago

References #131.

Gerrit91 commented 2 months ago

Still seems to behave a little unexpected (drop handler not called?):

{"time":"2024-07-18T13:51:10.792153119Z","level":"INFO","msg":"waiting 10 sec to enable os debugging","machineID":"00000000-0000-0000-0000-ac1f6b7aeb76"}                                                                                         
{"time":"2024-07-18T13:51:11.859914998Z","level":"DEBUG","msg":"lldp","machineID":"00000000-0000-0000-0000-ac1f6b7aeb76","detectedNeighbor":{"Name":"fra-equ01-r01leaf02","Description":"Cumulus Linux version 3.7.16 running on Accton AS7712-32X
","PortDescription":"swp1s2","Interface":"eth5","Chassis":{"Type":"Mac","Value":"b8:6a:97:73:f8:3a"},"Port":{"Type":"Mac","Value":"b8:6a:97:73:f8:3d"}}}                                                                                          
{"time":"2024-07-18T13:51:11.860072215Z","level":"DEBUG","msg":"lldp","machineID":"00000000-0000-0000-0000-ac1f6b7aeb76","detectedNeighbor":{"Name":"fra-equ01-r01leaf01","Description":"Cumulus Linux version 3.7.16 running on Accton AS7712-32X
","PortDescription":"swp1s2","Interface":"eth4","Chassis":{"Type":"Mac","Value":"b8:6a:97:74:00:3a"},"Port":{"Type":"Mac","Value":"b8:6a:97:74:00:3d"}}}                                                                                          
{"time":"2024-07-18T13:51:20.793324684Z","level":"INFO","msg":"event","machineID":"00000000-0000-0000-0000-ac1f6b7aeb76","event":"Booting New Kernel","message":"booting into distro kernel"}                                                     
level=warn component=client host=loki.metal-stack.dev msg="error sending batch, will retry" status=-1 error="Post \"https://loki.metal-stack.dev/loki/api/v1/push\": context deadline exceeded"                                                   
level=error component=client host=loki.metal-stack.dev msg="final error sending batch" status=-1 error="Post \"https://loki.metal-stack.dev/loki/api/v1/push\": context deadline exceeded"                                                        
{"time":"2024-07-18T13:51:30.794559249Z","level":"ERROR","msg":"event","machineID":"00000000-0000-0000-0000-ac1f6b7aeb76","cannot send event":"Booting New Kernel","error":"rpc error: code = DeadlineExceeded desc = received context error while
 waiting for new LB policy update: context deadline exceeded"}
Gerrit91 commented 1 month ago

Just setting the timeouts properly should be sufficient. All the other stuff we tried did not really work. The way it is now the logger will not be blocked for longer than one second.