jsdelivr / globalping-hwprobe

OS image for Globalping hardware probes. Become a GitHub Sponsor to get yours!
19 stars 6 forks source link

Update node.js on the existing probes #37

Closed MartinKolarik closed 7 months ago

MartinKolarik commented 9 months ago

Since the existing probes don't have an automated container update mechanism (https://github.com/jsdelivr/globalping-hwprobe/issues/28) and we have a similar issue with people who run the probes in VMs and don't update their docker containers, I've been exploring the option of adding an update mechanism for node.js directly into the docker container. The reason is that the node.js version used on the existing probes already reached EOL, which also prevents us from updating npm packages in our code base (since those usually stop supporting EOL versions quickly).

I have a repository hosting a modified version of the probe software, which includes this feature, with the main part being this file. It's based on a nvm script, which handles the whole process of downloading and verification.

I've also built a custom version of the FW to test this out on the HW probes. Unfortunately, while it works reliably in docker on my PC, it fails most of the time on the HW probe. I'm not 100% sure why since the failure results in a restart and loss of all logs, but my guess is that there is not enough RAM to handle the downloaded file combined with the update itself. Occasionally, though, it works - it just takes many reboots to succeed. The problem is it can often take 1 hour or more of retries, and since the update is in-memory only, it needs to happen again if the probe is shut down.

@kernelgurumeditation it would be great if you could take a look at this, verify the real cause of failures, and see if we can make any improvements/changes that would make the update work reliably (it is fine if it takes, let's say, 5 minutes after each startup). However, keep in mind that the goal is to make this work on the existing probes without user interaction, so we're limited to making changes to the update process itself; no FW changes will help here.

kernelgurumeditation commented 7 months ago

@MartinKolarik , Apologies. I did look at it when you opened the issue, but I forgot to get back to you with my findings. The current HW probe doesn't have enough RAM, the download process and the container already running exhausts the available RAM to a point that the kernel needs to resort to flushing all the caches to make some RAM available, doing that, the performance of the system almost grinds to an HALT, making the H3 watchdog to reboot the system as a last resort measure.

But your idea is perfect for the probes running on VMs. I'm not a Node JS coder, but maybe you could make the update depending on the container's hardware?

Something like:

if (not_running_in_hwprobe()) update_nodeejs()"

MartinKolarik commented 7 months ago

But your idea is perfect for the probes running on VMs. I'm not a Node JS coder, but maybe you could make the update depending on the container's hardware? Something like: if (not_running_in_hwprobe()) update_nodeejs()"

Yes, that's a backup solution we can do but we wanted to check first if there's anything we can do about the HW probes. Seeing that the answer is likely no, @jimaek I suggest we move with the update process for docker at least and see how many probes remain outdated after.

MartinKolarik commented 7 months ago

@kernelgurumeditation actually, we have another task where we need a check like not_running_in_hwprobe() as well, maybe you can help with that part? Could you post the output of the following command when run in the container on our v1 HW probe?

node -e 'console.log(os.arch(), os.cpus(), os.machine(), os.platform(), os.totalmem(), os.hostname())'

And if you think of anything else that could be used to detect the HW probe, let me know as well.

kernelgurumeditation commented 7 months ago

@MartinKolarik

In firmware V2, we will be exporting these variables to the container:

export GP_HOST_HW=true export GP_HOST_DEVICE=v1

arm [ { model: 'ARMv7 Processor rev 5 (v7l)', speed: 1008, times: { user: 2877680, nice: 0, sys: 3510380, idle: 51429670, irq: 0 } }, { model: 'ARMv7 Processor rev 5 (v7l)', speed: 1008, times: { user: 2958640, nice: 200, sys: 3448240, idle: 51628800, irq: 0 } }, { model: 'ARMv7 Processor rev 5 (v7l)', speed: 1008, times: { user: 2910440, nice: 0, sys: 3442280, idle: 51727280, irq: 0 } }, { model: 'ARMv7 Processor rev 5 (v7l)', speed: 1008, times: { user: 2393870, nice: 0, sys: 6452470, idle: 47907330, irq: 0 } } ] armv7l linux 520839168 globalping-probe-708e

MartinKolarik commented 7 months ago

Thanks, can you also try uname -a ?

kernelgurumeditation commented 7 months ago

@MartinKolarik

uname -a

Linux globalping-probe-708e 5.10.117 #1 SMP Wed May 18 08:23:49 UTC 2022 armv7l GNU/Linux

MartinKolarik commented 7 months ago

Thanks. It turns out all HW devices are sufficiently up-to-date, so we'll proceed with updating only non-HW probes for now.

MartinKolarik commented 6 months ago

@kernelgurumeditation can you please also let me know what's the output of df --block-size=MB ?

kernelgurumeditation commented 6 months ago

@MartinKolarik

as requested:

image