Closed scottyeager closed 1 year ago
Unfortunately this issue is not zos related. The initial download process is done by the node BIOS (ipxe). Hence this can be duo to one of the following causes:
In all cases this can't be zos issue because zos was not even booted yet. Hence i will move it to 0-initramfs
Kernel are served by the bootstrap service, which for kernel part, only forward the binary. I'll try to move that serving from Flask to Caddy on front to offload the web service, I have no clue yet if that could helps but worth a try. I'll keep you in touch when it's done.
Otherwise yes it's more probably client side issue but can be related to ipxe not supporting well the firmware, I can update ipxe for some client to see if some improvements are made.
It's pushed into production, kernel are served by frontend Caddy now, let's see if there is any improvement :)
Great, thanks @maxux. I'll keep an eye on reports from farmers to see if this helps, and also suggest ensuring that firmware is up to date.
@scottyeager Nothing new regarding this ? Can I close the issue ? :)
No reports of this issue lately. I'll close. Thanks @maxux.
Farmers sometimes report that the bootstrapping process hangs at
Downloading Zero-OS image...
. While this is often resolved by rebooting the node, a reboot doesn't always help.One farmer noticed this behavior when attempting to boot multiple nodes simultaneously:
With WoL coming, having nodes stuck in this state will be a much bigger problem. Although the root cause may indeed be within the farmer's network or at the flist hub, the bootstrap code should at least be able to recover via some kind of timeout and retry in cases where a manual reboot is sufficient.