spinkube / spin-plugin-kube

A Spin plugin for interacting with Kubernetes.
Other
27 stars 7 forks source link

SpinKube x Rancher Desktop integration throws inconsistent results for more complex apps #80

Closed divya-mohan0209 closed 5 days ago

divya-mohan0209 commented 6 months ago

Context:

I tried out the SpinKube x Rancher Desktop integration detailed on this page. It works seamlessly for the hello-world application detailed there & on the Fermyon blog.

However, when I tried installing some of the other complex templates and containerizing them, such as

or even templates of my own

there is inconsistent behaviour, i.e. they sometimes work and most of the time, they don't.

This is when the spin applications themselves work fine on my machine.

What is the error?

The pods enter the CrashLoopBackOff stage and are terminated with the following message: Last state: Terminated with 137: Error.

Some additional notes

Lastly, I wasn't sure where the error was, so I filed it against this repo. I'll also open an issue against the Rancher Desktop issues GitHub repo.

Infrastructure details

radu-matei commented 6 months ago

Thanks for opening this, @divya-mohan0209! This should help triage where the issues should be (my assumption is most issues would come from https://github.com/spinkube/containerd-shim-spin).

Keeping this open until we create issues for each of these.

Thanks!

divya-mohan0209 commented 6 months ago

Okie dokie, thank you @radu-matei! I shall keep this in mind next time I open issues :)

radu-matei commented 6 months ago

Hey, @divya-mohan0209 -- just tried all all applications you referenced on a cluster with the latest release of SpinKube and the latest release of the shim, and couldn't reproduce ti with any of the applications.

The most likely cause here I think would be running an old version of the shim -- which might come pre-baked into Rancher Desktop.

Could you please run kubectl annotate node --all kwasm.sh/kwasm-node=true one more time to force KWasm to update?

divya-mohan0209 commented 6 months ago

It is listed as one of the steps, but how can I check if the version is updated? and what is the expected version it needs to be updated to?

divya-mohan0209 commented 6 months ago

Also, the thing is it runs when you first run all the applications. But when you reset Rancher Desktop and retry the steps all over again, it doesn't work.

radu-matei commented 6 months ago

Yeah, I think it has something to do with the shim version used. @rajatjindal has a one-liner to verify the version.

In the meantime, tagging @tpmccallum who wrote the instructions for Rancher Desktop, if we need to update them.

divya-mohan0209 commented 6 months ago

Also, retried it just now by rerunning the script. No luck.

image

rajatjindal commented 6 months ago

I created the Rancher Desktop cluster, and noticed below (that it is indeed old version of shim).

@divya-mohan0209, could you please verify this on your cluster as well.

kubectl debug -it node/lima-rancher-desktop --image ubuntu:latest -n default -- /host/usr/local/containerd-shims/containerd-shim-spin-v2 -v

containerd-shim-spin-v2:
  Runtime: spin
  Version: 0.11.1
  Revision: 7058f601f3e92ee
divya-mohan0209 commented 6 months ago

Just supplementing the error logs here, as well.

time="2024-04-30T12:59:26.854405727Z" level=info msg="CreateContainer within sandbox \"2939d8afcf2be62b2962cecfa7a0572f02f0da852f701bf1f9cf9260919e80a0\" for container &ContainerMetadata{Name:my-first-app,Attempt:0,}"
time="2024-04-30T12:59:26.856353602Z" level=info msg="CreateContainer within sandbox \"7b25fe8b3ff2a75d89bd1654396426f91f599aa00d7cc626089b28ffc8226dd3\" for &ContainerMetadata{Name:my-first-app,Attempt:0,} returns container id \"ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820\""
time="2024-04-30T12:59:26.856981436Z" level=info msg="StartContainer for \"ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820\""
time="2024-04-30T12:59:26.866778727Z" level=info msg="found manifest with WASM OCI image format."

time="2024-04-30T12:59:26.871746394Z" level=info msg="CreateContainer within sandbox \"2939d8afcf2be62b2962cecfa7a0572f02f0da852f701bf1f9cf9260919e80a0\" for &ContainerMetadata{Name:my-first-app,Attempt:0,} returns container id \"899a0e4ca27f947e6b02c5d431b7cd4b5fcb9bfaa7369c0978fe4c8279c33b45\""
time="2024-04-30T12:59:26.872443811Z" level=info msg="StartContainer for \"899a0e4ca27f947e6b02c5d431b7cd4b5fcb9bfaa7369c0978fe4c8279c33b45\""
time="2024-04-30T12:59:26.879040061Z" level=info msg="found manifest with WASM OCI image format."

time="2024-04-30T12:59:26.989514602Z" level=info msg="cgroup manager V2 will be used"

time="2024-04-30T12:59:26.997927102Z" level=info msg="cgroup manager V2 will be used"

time="2024-04-30T12:59:27.042846269Z" level=info msg="close_range; preserve_fds=0"

time="2024-04-30T12:59:27.043204978Z" level=warn msg="intermediate process already reaped"

time="2024-04-30T12:59:27.044068936Z" level=info msg="close_range; preserve_fds=0"

time="2024-04-30T12:59:27.044213894Z" level=warn msg="intermediate process already reaped"

time="2024-04-30T12:59:27.045217644Z" level=info msg="starting instance: ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820"

time="2024-04-30T12:59:27.045370853Z" level=info msg="calling start function"

time="2024-04-30T12:59:27.045397978Z" level=info msg="setting up wasi"

time="2024-04-30T12:59:27.046566228Z" level=info msg="starting instance: 899a0e4ca27f947e6b02c5d431b7cd4b5fcb9bfaa7369c0978fe4c8279c33b45"

time="2024-04-30T12:59:27.046550103Z" level=info msg=" >>> configuring spin oci application 111"

time="2024-04-30T12:59:27.046705936Z" level=info msg="calling start function"

time="2024-04-30T12:59:27.046745853Z" level=info msg="setting up wasi"

time="2024-04-30T12:59:27.047655519Z" level=info msg="StartContainer for \"ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820\" returns successfully"
time="2024-04-30T12:59:27.046855769Z" level=info msg="writing artifact config to cache, near "/.cache/registry/manifests""

time="2024-04-30T12:59:27.052521603Z" level=info msg="StartContainer for \"899a0e4ca27f947e6b02c5d431b7cd4b5fcb9bfaa7369c0978fe4c8279c33b45\" returns successfully"
time="2024-04-30T12:59:27.057878186Z" level=info msg=" >>> configuring spin oci application 111"

time="2024-04-30T12:59:27.057913728Z" level=info msg="writing artifact config to cache, near "/.cache/registry/manifests""

time="2024-04-30T12:59:27.060346811Z" level=info msg="writing spin oci config to "/spin.json""

time="2024-04-30T12:59:27.064799728Z" level=info msg="writing spin oci config to "/spin.json""

time="2024-04-30T12:59:27.111433603Z" level=info msg="error running start function: failed to resolve content for component "my-first-app""

time="2024-04-30T12:59:27.112347144Z" level=info msg="error running start function: failed to resolve content for component "my-first-app""

time="2024-04-30T12:59:27.114542228Z" level=info msg="no child process"

time="2024-04-30T12:59:27.115303978Z" level=error msg="ttrpc: received message on inactive stream" stream=21
time="2024-04-30T12:59:27.115418061Z" level=info msg="deleting instance: ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820"

time="2024-04-30T12:59:27.115589936Z" level=info msg="cgroup manager V2 will be used"

time="2024-04-30T12:59:27.115984811Z" level=info msg="shim disconnected" id=ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820 namespace=k8s.io
time="2024-04-30T12:59:27.116002519Z" level=warning msg="cleaning up after shim disconnected" id=ca1f085e61081a93084dbe1c30e93a98ab7038e6c9ec5a1c119caabd363cd820 namespace=k8s.io                  
time="2024-04-30T12:59:27.116011894Z" level=info msg="cleaning up dead shim" namespace=k8s.io
divya-mohan0209 commented 6 months ago

It is!

kubectl debug -it node/lima-rancher-desktop --image ubuntu:latest -n default -- /host/usr/local/containerd-shims/containerd-shim-spin-v2 -v
Creating debugging pod node-debugger-lima-rancher-desktop-f6c9b with container debugger on node lima-rancher-desktop.
containerd-shim-spin-v2:
  Runtime: spin
  Version: 0.11.1
  Revision: 7058f601f3e92ee

But I have run the kubectl annotate node --all kwasm.sh/kwasm-node=true twice and it still doesn't update the shim.

radu-matei commented 6 months ago

@divya-mohan0209 could you try:

kubectl annotate node --all kwasm.sh/kwasm-node-
kubectl annotate node --all kwasm.sh/kwasm-node=true

And check the jobs in the kwasm namespace, then the shim version?

divya-mohan0209 commented 6 months ago

Yep, not looking good still.

The jobs:

pod/lima-rancher-desktop-provision-kwasm-htwv8   0/1     Unknown     0             42s
pod/lima-rancher-desktop-provision-kwasm-cs2wf   0/1     Completed   0             31s

The shim version:

 ~ kubectl debug -it node/lima-rancher-desktop --image ubuntu:latest -n default -- /host/usr/local/containerd-shims/containerd-shim-spin-v2 -v
Creating debugging pod node-debugger-lima-rancher-desktop-rvrl2 with container debugger on node lima-rancher-desktop.
containerd-shim-spin-v2:
  Runtime: spin
  Version: 0.11.1
  Revision: 7058f601f3e92ee
divya-mohan0209 commented 6 months ago

Also, checked the kwasm logs for ya

2024-04-30T14:00:06.644825673Z stderr F {"level":"info","node":"lima-rancher-desktop","time":"2024-04-30T14:00:06Z","message":"Label removed. Removing Job."}
2024-04-30T14:00:13.891517177Z stderr F {"level":"info","node":"lima-rancher-desktop","time":"2024-04-30T14:00:13Z","message":"Trying to Deploy on lima-rancher-desktop"}
2024-04-30T14:00:13.897735427Z stderr F {"level":"info","time":"2024-04-30T14:00:13Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:13.95474801Z stderr F {"level":"info","time":"2024-04-30T14:00:13Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:17.702458053Z stderr F {"level":"info","time":"2024-04-30T14:00:17Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:17.707615387Z stderr F {"level":"info","time":"2024-04-30T14:00:17Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:24.010167056Z stderr F {"level":"info","time":"2024-04-30T14:00:24Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:26.643764558Z stderr F {"level":"info","time":"2024-04-30T14:00:26Z","message":"Job lima-rancher-desktop-provision-kwasm is still Ongoing"}
2024-04-30T14:00:26.648906849Z stderr F {"level":"info","time":"2024-04-30T14:00:26Z","message":"Job lima-rancher-desktop-provision-kwasm is Completed. Happy WASMing"
lima-rancher-desktop:/var/log/pods/kwasm_lima-rancher-desktop-provision-kwasm-cs2wf_80b08b4a-fdb9-4ee0-a5d0-e7da89625e38/kwasm-provision$ sudo tail -f 0.log
2024-04-30T14:00:24.459592432Z stdout F No change in containerd/config.toml
rajatjindal commented 6 months ago

it seems latest version as per kwasm-node-installer is indeed v0.11.1. I will open PR to use latest version in kwasm-node-installer.

having said that, the instructions on https://www.spinkube.dev/docs/spin-operator/tutorials/integrating-with-rancher-desktop/, does refer to a different node-installer image which refers to latest spin-shim version.

@divya-mohan0209, could you please confirm what is the command you used to install the kwasm-operator? or this is the default version that comes with Rancher Desktop?

divya-mohan0209 commented 6 months ago

@divya-mohan0209, could you please confirm what is the command you used to install the kwasm-operator? or this is the default version that comes with Rancher Desktop?

I used the one in the SpinKube docs that you've listed above.

rajatjindal commented 6 months ago

could you pls share the output of

kubectl get pods -n kwasm -o wide
divya-mohan0209 commented 6 months ago

I'll definitely do that once I login tomorrow and the app crashes. I had reset the entire thing for today's live code stream 🤣

divya-mohan0209 commented 6 months ago

Sorry for the delay in getting back! I had to re-do the steps :)

kubectl get pods -n kwasm -o wide
NAME                                         READY   STATUS      RESTARTS       AGE     IP           NODE                   NOMINATED NODE   READINESS GATES
lima-rancher-desktop-provision-kwasm-mxxfv   0/1     Completed   0              2d11h   <none>       lima-rancher-desktop   <none>           <none>
lima-rancher-desktop-provision-kwasm-5j7bt   0/1     Unknown     0              2d11h   <none>       lima-rancher-desktop   <none>           <none>
kwasm-operator-6c76c5f94b-hdb2h              1/1     Running     4 (5m1s ago)   2d11h   10.42.0.33   lima-rancher-desktop   <none>           <none>
jandubois commented 6 months ago

I've verified that this issue is caused by the old shim version and is fixed by using 0.14.1: https://github.com/rancher-sandbox/rancher-desktop/issues/6785#issuecomment-2103458271

My comment there also shows how you can upgrade the shim version in Rancher Desktop, which manages shims itself and shouldn't need kwasm at all as long as you use the right RuntimeClass name in your SpinAppExecutor (spin instead of wasmtime-spin-v2). I've written about this on Slack at https://cloud-native.slack.com/archives/C06PC7JA1EE/p1714674606796679.

Note that the next release of Rancher Desktop (1.14) will have an option to install spinkube (and the spin cli), so none of the manual setup should be necessary anymore (once it is released).

bacongobbler commented 5 months ago

Thank you @jandubois for the confirmation. I just checked our documentation and it looks like we ask the user to install the latest version of Rancher Desktop. We will keep this ticket open until Rancher Desktop 1.14 has been released. We really appreciate you chiming in here and helping us confirm the issue. :)