jhuckaby / Cronicle

A simple, distributed task scheduler and runner with a web based UI.
http://cronicle.net
Other
3.84k stars 387 forks source link

Cronicle install fails if there are limited resources e.g. RAM #772

Closed sv87411 closed 4 months ago

sv87411 commented 4 months ago

Summary

First, Cronicle, GREAT! Am loving it. So much so I've been setting up all of my Proxmox Containers with the Cronicle worker so I can use Cronicle to schedule jobs on them.

My observation is: It would help if the minimum system requirements (memory, disk etc) were documented. Especially memory is important for Cronicle running on virtual machines/containers. Really these are the system reqs for Node.js and NPM but if these elements fail to install/setup correctly so does Cronicle.

Or it would help to maybe check if possible the memory/disk available before installing. I know in this modern world this isn't often required, but when working with virtual machines/containers - as I found to my peril - you could have a device with very limited resources.

I have a Proxmox container running Pihole and only had 128MB of RAM assigned to the container - which is fine for Pihole. Node.js installed fine. Cronicle installed fine (it seemed), but it actually silently failed. I didn't find this until I broke down the install script.

During the "Executing command: npm install --unsafe-perm" section of the install, NPM actually fails, but this is not logged in the Cronicle install log. Running the install process manually the NPM command fails with a really useless 'Killed' message. I Googled this and found that this usually means NPM has insufficient memory.

I increased the containers memory resource to 1GB and all worked fine.

Additionally it would be useful to have an uninstall process or a way to force an install/reset the config as while I was debugging this fixing a half NPM install was messy and I had to go back to scratch. I found that removing /opt/cronicle (or preferably renaming it to /opt/cronicle.old) worked and the install runs through again OK.

Steps to reproduce the problem

1 - Install Cronicle worker on a device with limited memory. 2 - Try to execute the "npm run boot" command after Cronicle install to setup the Systemd service. 3 - This will fail because "pixl-boot" isn't installed because the main Cronicle install fails.

Your Setup

Proxmox container with only 128MB RAM.

Operating system and version?

Proxmox 8.2 with a container created using a Debian 12 template.

Node.js version?

v20.14.0

Cronicle software version?

Version 0.9.51

Are you using a multi-server setup, or just a single server?

Single server, multiple workers.

Are you using the filesystem as back-end storage, or S3/Couchbase?

No

Can you reproduce the crash consistently?

Yes, see above.

Log Excerpts

Unfortunately the install log doesn't show any error as this is NPM failing.

jhuckaby commented 4 months ago

Okay, so I've been testing this all night long. I don't know anything about "Proxmox", but I did use Docker to limit the memory size.

With a 128MB limit on the container, the install process works just fine:

docker run -it --memory 128m --init node:lts bash

root@618e96469fd5:/# curl -s https://raw.githubusercontent.com/jhuckaby/Cronicle/master/bin/install.js | node

... snip ...

Executing command: npm install --unsafe-perm

> Cronicle@0.9.52 postinstall
> pixl-boot install

Installing startup service: Cronicle...OK.
Successfully registered startup service.

added 247 packages, and audited 248 packages in 1s

8 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

Docker stats shows that it never used above 81MB or so:

jhuckaby@joemax ~ $ docker stats --no-stream
CONTAINER ID   NAME             CPU %     MEM USAGE / LIMIT   MEM %     NET I/O          BLOCK I/O        PIDS
a667392275a7   cool_engelbart   79.79%    81.02MiB / 128MiB   63.30%    12.8MB / 276kB   54.7MB / 250MB   21

I then tried it with a 64MB RAM limit, with the same results. It installed just fine. It took quite a bit longer (about 7 seconds) but it installed successfully:

docker run -it --memory 64m --init node:lts bash

root@618e96469fd5:/# curl -s https://raw.githubusercontent.com/jhuckaby/Cronicle/master/bin/install.js | node

... snip ...

Executing command: npm install --unsafe-perm

> Cronicle@0.9.52 postinstall
> pixl-boot install

Installing startup service: Cronicle...OK.
Successfully registered startup service.

added 247 packages, and audited 248 packages in 7s

8 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

Docker stats shows that it was pegged at 59MB usage or so. Since it took so long, I believe it was RAM-constrained, probably using swap. So we're close to the limit here. Docker stats:

jhuckaby@joemax ~ $ docker stats --no-stream
CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT   MEM %     NET I/O          BLOCK I/O         PIDS
2cdb92feb739   hopeful_benz   0.01%     59.45MiB / 64MiB    92.89%    11.5MB / 238kB   1.78MB / 91.3MB   21

So then I dropped it all the way to 32MB RAM limit. This totally failed (as expected). Nothing can run with that little memory. But failed in a good way -- see here:

docker run -it --memory 32m --init node:lts bash

root@618e96469fd5:/# curl -s https://raw.githubusercontent.com/jhuckaby/Cronicle/master/bin/install.js | node

Cronicle Installer v1.5
Copyright (c) 2015 - 2022 PixlCore.com. MIT Licensed.
Log File: /opt/cronicle/logs/install.log

Fetching release list...
Installing Cronicle v0.9.52...
Installing dependencies...
Killed

ERROR: Failed to install dependencies: Error: Command failed: npm install --unsafe-perm
Killed

As you can see the NPM command failed, but the output of the NPM command was captured (the output is "Killed", meaning the kernel killed the process, due to OOM). It also failed the entire install process.

This result and error message was also logged in the installer log file (/opt/cronicle/logs/install.log).

Everything is working as designed here 🤷🏻‍♂️

I think what happened in your case is that the NPM subprocess wasn't killed, but rather the Node.js install.js process itself died due to OOM. Both processes are running simultaneously during install, and who knows which one the Linux kernel will decide to kill in an OOM situation. If that happens, there is nothing much we can do. There's no way to detect that. 🤷🏻

...it would help to maybe check if possible the memory/disk available before installing.

Well, it's rather difficult to measure the available memory inside of a container. The VM kernel can't see the limits, as they're "soft limits" (using free -m shows the total RAM of the host machine, as does the Node.js os.totalmem call). So I can't easily add something that measures the available memory and throws an error if it is too low.

Additionally it would be useful to have an uninstall process or a way to force an install/reset the config as while I was debugging this fixing a half NPM install was messy and I had to go back to scratch.

Err, well, that's really easy. Just blow away the dir and run the single install command again. It's quite literally just this:

rm -rf /opt/cronicle
curl -s https://raw.githubusercontent.com/jhuckaby/Cronicle/master/bin/install.js | node

I don't know how to make that any easier, sorry 😬 🤷🏻‍♂️

jhuckaby commented 4 months ago

You might want to try a Docker image that already has everything built, so you don't have to run the installer or hit NPM at all: https://github.com/cronicle-edge/cronicle-edge/blob/main/Dockerfile

(Note: This is an unofficial fork of Cronicle that I don't maintain, but the author is cool).

sv87411 commented 4 months ago

Thank you for so much investigation, I really didn't mean for you to test things all night! Above and beyond though so I'm really grateful!

First, Proxmox is fantastic (even though I only use it in a home lab environment). Proxmox uses LXC containerisation - basically a virtualised OS with each container sharing the host node's kernel rather than Docker's virtualised apps concept. It allows you the freedom to quickly fire up a container (using an OS template, Debian, Alpine, Ubuntu etc) to use as a dedicated server. But obviously you have to maintain the container to ensure apps/OS components are updated. This is much easier to do with Docker by just updating the image file.

I do run a couple of Docker instances - inside Proxmox containers funnily enough. So I'm already aware of it. I didn't really want to go installing Docker on all the devices I was going to send Cronicle events off to - a lot of which are just small Proxmox containers. I was happier to install Node.js/NPM on each node though.

Because the Proxmox container is a full OS instance the memory resources you allocate the container are all it gets/sees. Therefore a 'free -m' truly does show whatever it has. However, looking at your investigation this is probably what's tripped me up, the container memory is also used by the OS. With Docker the memory is allocated to the container/app and it can utilise it all.

I just fired up a test container with 128MB and only 84MB free.

root@test:~# free -m
               total        used        free      shared  buff/cache   available
Mem:             128          18          84           0          25         109
Swap:              0           0           0

I also restored a version of my Pihole container prior to me getting started with Cronicle and I can see I really restricted it's resources. Ooops, only 51MB free and no swap! It looks like I might have been just on the edge for Node/NPM. But the 'npm install --unsafe-perm' certainly didn't fail very elegantly for me, it simply produced 'killed' and that was it.

root@pihole-test:~# free -m
               total        used        free      shared  buff/cache   available
Mem:             128          31          51           3          47          96
Swap:              0           0           0

Thank you again for the investigation, this has been a useful learning process for me and realisation that I need to pay more attention to the resources I give Proxmox containers especially if I set them up for a specific purpose and then expand what I run on them at a later date.

I still think it might be useful do a quick memory check and if there's less than 64MB available, for example, issue a warning. I know a lot of people run Proxmox and when you can fire up a container so quickly and give it only the resources it needs to do the job, you really could be running in a virtualised world with very limited resources.

Cronicle is excellent and I've migrated so many things to it now that were previously scheduled on disparate platforms, so I'm incredibly grateful for the work you've put into it! Thank you!

jhuckaby commented 4 months ago

But the 'npm install --unsafe-perm' certainly didn't fail very elegantly for me, it simply produced 'killed' and that was it.

I think that's normal, and "as designed" for Linux. What I mean is, that's the standard kernel OOM message that you get on STDERR when a process is killed. I don't think we can do any "better" than that, is all I'm saying. OOM is a catastrophic situation, especially during a software install, and nothing really "gracefully" dies.

I still think it might be useful do a quick memory check and if there's less than 64MB available, for example, issue a warning.

Okay, I'll do that.

Cronicle is excellent and I've migrated so many things to it now that were previously scheduled on disparate platforms, so I'm incredibly grateful for the work you've put into it! Thank you!

You're welcome! Glad you like it! ❤️

jhuckaby commented 4 months ago

https://github.com/jhuckaby/Cronicle/commit/cebf921cb5c90ef66472e1381498b3f2121aab0b