hocus-dev / hocus

🪄 Spin up ready-to-code, disposable dev environments on your own servers. Self-hosted alternative to Gitpod and Github Codespaces.
https://hocus.dev
Other
3.18k stars 79 forks source link

Failed to boot firecracker VM, failed to fetchRepository? #73

Open bgoosman opened 1 year ago

bgoosman commented 1 year ago

hocus-local-hocus-agent-1 | 2023-05-07T18:51:16.089Z [INFO] Booting firecracker VM with pid 254 took: 456.09 ms, TOTAL: 505.93 ms hocus-local-hocus-agent-1 | 2023-05-07T18:51:18.255Z [WARN] firecracker process with pid 254 closed: 0 hocus-local-hocus-agent-1 | 2023-05-07T18:52:11.437Z [WARN] Activity failed { hocus-local-hocus-agent-1 | error: FetchError: The request failed and the interceptors did not return an alternative response hocus-local-hocus-agent-1 | at DefaultApi.BaseAPI.fetchApi (/app/node_modules/firecracker-client/dist/runtime.js:100:31) hocus-local-hocus-agent-1 | at processTicksAndRejections (node:internal/process/task_queues:96:5) hocus-local-hocus-agent-1 | at DefaultApi.request (/app/node_modules/firecracker-client/dist/runtime.js:136:26) hocus-local-hocus-agent-1 | ... 5 lines matching cause stack trace ... hocus-local-hocus-agent-1 | at /app/agent.js:2259:16 { hocus-local-hocus-agent-1 | cause: TypeError: fetch failed hocus-local-hocus-agent-1 | at fetch (/app/node_modules/undici/index.js:105:13) hocus-local-hocus-agent-1 | at processTicksAndRejections (node:internal/process/task_queues:96:5) hocus-local-hocus-agent-1 | at DefaultApi.BaseAPI.fetchApi (/app/node_modules/firecracker-client/dist/runtime.js:84:28) hocus-local-hocus-agent-1 | at DefaultApi.request (/app/node_modules/firecracker-client/dist/runtime.js:136:26) hocus-local-hocus-agent-1 | at DefaultApi.createSyncActionRaw (/app/node_modules/firecracker-client/dist/apis/DefaultApi.js:83:26) hocus-local-hocus-agent-1 | at DefaultApi.createSyncAction (/app/node_modules/firecracker-client/dist/apis/DefaultApi.js:96:9) hocus-local-hocus-agent-1 | at FirecrackerService.shutdownVM (/app/agent.js:3817:5) hocus-local-hocus-agent-1 | at FirecrackerService.withVM (/app/agent.js:3798:9) hocus-local-hocus-agent-1 | at /app/agent.js:4082:7 hocus-local-hocus-agent-1 | at /app/agent.js:2259:16 { hocus-local-hocus-agent-1 | cause: [Error] hocus-local-hocus-agent-1 | } hocus-local-hocus-agent-1 | }, hocus-local-hocus-agent-1 | durationMs: 55931, hocus-local-hocus-agent-1 | isLocal: false, hocus-local-hocus-agent-1 | attempt: 1, hocus-local-hocus-agent-1 | namespace: 'default', hocus-local-hocus-agent-1 | taskToken: '...', hocus-local-hocus-agent-1 | workflowId: '271718a2-c2e1-4e43-8ffc-695c43e5c6d0', hocus-local-hocus-agent-1 | workflowRunId: '6d4504b2-5b2d-4d8b-89d6-961d0605af71', hocus-local-hocus-agent-1 | workflowType: 'runBuildfsAndPrebuilds', hocus-local-hocus-agent-1 | activityId: '2', hocus-local-hocus-agent-1 | activityType: 'fetchRepository', hocus-local-hocus-agent-1 | taskQueue: 'main' hocus-local-hocus-agent-1 | }

bgoosman commented 1 year ago

I'm on bitbucket, and the project page gave me a green checkmark on the git repository, so I'm not sure what to do now. I already added the publish ssh key to my workspace on bitbucket.

hugodutka commented 1 year ago

It looks like the VM that fetchRepository uses was killed before the Hocus agent could shut it down normally. This could have happened because your host was out of memory and the OOM killer did its thing. Are you running it on a system that has less than 8-16GB of free RAM? Hocus is currently very memory hungry and we are working on reducing that.

If memory is the culprit, an easy fix would be creating a swap file on your host. I would suggest 16GB of size. https://linuxize.com/post/create-a-linux-swap-file/

bgoosman commented 1 year ago

Hmm I should have 64 GB RAM total, on an Ubuntu 20.04 Focal OS. image

top - 15:38:46 up 20:48,  1 user,  load average: 0.23, 0.55, 0.54
Tasks: 267 total,   1 running, 266 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  64241.8 total,  35695.6 free,    466.1 used,  28080.1 buff/cache
MiB Swap:  32735.0 total,  32735.0 free,      0.0 used.  63065.2 avail Mem

I forgot I got this warning which I didn't know what to do with. (Sorry, not a linux pro 😅)

> HOCUS_HOSTNAME="x" ops/bin/local-up.sh
HOCUS_HOSTNAME="x" ops/bin/local-up.sh
[WARNING] Host kernel *might* be too old. If you encounter issues with nested virtualization please first try running Hocus on at least the 5.10 kernel
Building docker images 👷📦
Building vm-builder done in 1.32 s ✅
Building db-autosetup done in 0.52 s ✅
Building keycloak done in 0.80 s ✅
Building temporal-codec done in 1.34 s ✅
Building ui done in 0.82 s ✅
Building agent done in 0.88 s ✅
Pulling docker images 📥 - ✅ in 1.26 s
Building MicroVMs 👷🖥️ - ✅ in 4.21 s
Seeding the DB 🌱 - ✅ in 0.77 s
Starting the DB 📙 - ✅ in 2.00 s
Starting Keycloak 🔑 - ✅ in 11.01 s
Starting Temporal ☁️  - ✅ in 12.52 s
Starting Hocus 🧙🪄  - ✅ in 3.98 s

You may access Hocus here: http://x:3000/ Creds: dev/dev
Keycloak: http://x:4200/ Creds: admin/admin
Temporal: http://x:8080/

To delete all data ./ops/bin/local-cleanup.sh
To get debug logs: ./ops/bin/local-cmd.sh logs
To stop the deploy: ./ops/bin/local-cmd.sh down

Hmm I ran as root, but some of the processes are showing up as a sub account I created earlier.

top - 15:42:13 up 20:51,  1 user,  load average: 0.22, 0.46, 0.51
Tasks: 345 total,   2 running, 341 sleeping,   0 stopped,   2 zombie
%Cpu(s):  1.7 us,  0.6 sy,  0.0 ni, 97.2 id,  0.5 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :  64241.8 total,  34705.6 free,   1400.9 used,  28135.2 buff/cache
MiB Swap:  32735.0 total,  32735.0 free,      0.0 used.  62099.2 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3153183 ben       20   0  843896 119528  44260 S   8.0   0.2   0:04.17 temporal-server
   6448 root      20   0 2171064  54520  34280 S   1.7   0.1  15:32.09 containerd
   6569 root      20   0 2743272 139344  61024 S   1.7   0.2  18:06.06 dockerd
3153479 70        20   0  174716  18744  15452 S   1.0   0.0   0:00.13 postgres
3154508 root      20   0  104.7g 374672  59088 S   1.0   0.6   0:12.05 node
3154130 root      20   0 1596484  95224  42048 S   0.7   0.1   0:01.48 node
      1 root      20   0  169012  12596   8512 S   0.3   0.0   4:01.05 systemd
    548 message+  20   0    7404   4520   3868 S   0.3   0.0   0:56.35 dbus-daemon
3138296 root      20   0   19268   9724   8032 S   0.3   0.0   0:00.97 systemd
3147404 root      20   0       0      0      0 I   0.3   0.0   0:00.03 kworker/10:4-events
3152434 root      20   0  720752   9720   6812 S   0.3   0.0   0:00.40 containerd-shim
3152662 ben       20   0 5547528 320828  29992 S   0.3   0.5   0:11.37 java
3153457 70        20   0  174420  23824  20852 S   0.3   0.0   0:00.16 postgres
3153480 70        20   0  173872  15716  12900 S   0.3   0.0   0:00.01 postgres
3154111 root      20   0  720752  10388   7388 S   0.3   0.0   0:00.35 containerd-shim
3170811 70        20   0  175236  19208  15352 S   0.3   0.0   0:00.04 postgres

Here are my prebuild limits. I just raised prebuild to 4096*4.

image

On the brightside the git integration seems to be working, since a push to dev triggered a prebuild

image

Raising the prebuild limit did not help 😞

gorbak25 commented 1 year ago

@bgoosman