spinkube / spin-operator

Spin Operator is a Kubernetes operator that empowers platform engineers to deploy Spin applications as custom resources to their Kubernetes clusters
https://www.spinkube.dev/docs/overview/
Other
168 stars 22 forks source link

Deployment Issue: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "spin" is configured #289

Closed chokosabe closed 2 months ago

chokosabe commented 2 months ago

Trying to deply a SpinApp to a k3s cluster running on Rocky Linux nodes.

Deployed this SpinApp

apiVersion: core.spinoperator.dev/v1alpha1
kind: SpinApp
metadata:
  name: ekko-consumer
spec:
  image: "${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}"
  executor: containerd-shim-spin 
  replicas: 3
  imagePullSecrets:
    - name: ekko-consumer-registry  
  variables:
    - name: SPIN_VARIABLE_DATABASE_URL
      value: ${SPIN_VARIABLE_DATABASE_URL}
    - name: SPIN_VARIABLE_REDIS_URL
      value: ${SPIN_VARIABLE_REDIS_URL}
    - name: SPIN_VARIABLE_TIMESCALEDB_URL
      value: ${SPIN_VARIABLE_TIMESCALEDB_URL}

Started getting this Error:

apiVersion: v1
count: 321
eventTime: null
firstTimestamp: '2024-08-02T12:48:08Z'
involvedObject:
  apiVersion: v1
  kind: Pod
  name: ekko-consumer-79b68b6ccc-8rfrm
  namespace: default
  resourceVersion: '121174374'
  uid: 9f4d67c9-78dc-423c-b8b6-d47683c340eb
kind: Event
lastTimestamp: '2024-08-02T13:58:16Z'
message: >-
  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get
  sandbox runtime: no runtime for "spin" is configured
metadata:
  creationTimestamp: '2024-08-02T12:48:08Z'
  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:count: {}
        f:firstTimestamp: {}
        f:involvedObject: {}
        f:lastTimestamp: {}
        f:message: {}
        f:reason: {}
        f:source:
          f:component: {}
          f:host: {}
        f:type: {}
      manager: kubelet
      operation: Update
      time: '2024-08-02T13:58:16Z'
  name: ekko-consumer-79b68b6ccc-8rfrm.17e7ea334501cee9
  namespace: default
  resourceVersion: '121805734'
  uid: d8a44597-ef65-4f33-b696-2d94bcea6baf
reason: FailedCreatePodSandBox
reportingComponent: ''
reportingInstance: ''
source:
  component: kubelet
  host: worker-01

I assumed it was related to this and applied the change: https://github.com/deislabs/containerd-wasm-shims/issues/165

But issue still persisted.

At this point just trying to find ways to debug this. i.e What should the /etc/containerd/config.toml look like?

After applying the fix above, should I have re-installed The runtimeclassManager or the spin Operator again?

Also, what is the check that is being run that generates the error:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get
  sandbox runtime: no runtime for "spin" is configured

For reference, entry to /etc/containerd/config.toml looks like this

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
runtime_type = "io.containerd.spin.v2"

Failing all that, is there a Safe distro for kubernetes nodes that Spinkube works well with. Noticed that the shim is installed to

/opt/kwasm/bin/containerd-shim-spin-v2

Which is not on the $PATH

vdice commented 2 months ago

Hi @chokosabe, I'm not experienced with Rocky Linux but I do have a working KinD cluster and the one thing that stands out to me is the difference between /etc/containerd/config.toml entries. For my cluster, I have:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
    runtime_type = "/opt/kwasm/bin/containerd-shim-spin-v2"

And, like you, /opt/kwasm/bin/containerd-shim-spin-v2 exists and can be run. When I log into the node, this binary isn't on the PATH either... at least for the root user.

Can you point me to the steps you followed to install SpinKube? Was it https://www.spinkube.dev/docs/install/installing-with-helm/?

The guides on the SpinKube docs site use the node-installer image from spinkube/containerd-shim-spin, eg ghcr.io/spinkube/containerd-shim-spin/node-installer:v0.15.1. The installer logic can be seen here. As seen from the script, when installation is successful, runtime_type should be "'$KWASM_DIR'/bin/containerd-shim-spin-v2", so the value you've shown,"io.containerd.spin.v2" doesn't seem correct. It could be that there is something in the Rocky Linux setup that thwarts that script.

chokosabe commented 2 months ago

Hi @vdice , thanks for the reply.

I've gone into the boxes and made the changes you outlined above. I think Rocky Linux and Containerd dont work well together straight out of the box and that affected the containerd service. With the changes having been made, is there any way to test that the shim is callable? Ideally the install script would include this.

vdice commented 2 months ago

With the changes having been made, is there any way to test that the shim is callable?

One test is to call it by its path when on the node:

root@kind-worker2:/# /opt/kwasm/bin/containerd-shim-spin-v2 -v
containerd-shim-spin-v2:
  Runtime: spin
  Version: 0.15.1
  Revision: 57d595b1d3effda

So, after updating /etc/containerd/config.toml and restarting containerd, SpinApps continue to fail to run per the same error you provided above?

chokosabe commented 2 months ago

Hi,

Yes I can see it when called on the node as well (cped the binary to /usr/bin since all the other shims are there):

[root@staging-master-01 ~]# /usr/bin/containerd-shim-spin-v2 -v containerd-shim-spin-v2: Runtime: spin Version: 0.15.1 Revision: 57d595b1d3effda

I am still getting the error: "Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "spin" is configured"

I'm going to try removing the runtimeclass and spin-operator and re-installing again

endocrimes commented 2 months ago

You could also use ctr run - but because it would pull and run an image it would be tricky to run by-default 😅 (esp with more complex security policies in production clusters)

chokosabe commented 2 months ago

Yep - Still getting the error. Deleted the Runtime class and Spin-Operator and then reinstalled. Exact same Error - i.e no change. It'd be great to know at which point the error is generated. i.e what the request to the node looks like to generate the error.

For a test using ctr is there a public spin image that we could pull down. I can try with standard docker images but those will clearly fail

vdice commented 2 months ago

Can you try the hello-world sample app? On my (kind) node, I did have to cp /opt/kwasm/bin/containerd-shim-spin-v2 /usr/local/bin/containerd-shim-spin-v2 for the ctr run ... command to work:


$ ctr image pull ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7
...
$ ctr run --rm --net-host --runtime io.containerd.spin.v2 ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7 hello-world bogus-arg

Serving http://0.0.0.0:80
Available Routes:
  hello-world: http://0.0.0.0:80 (wildcard)
chokosabe commented 2 months ago

Image Pull and ctr run both worked fine:

[root@staging-master-01 ~]# ctr image pull ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: downloading    |--------------------------------------|    0.0 B/507.0 B 
elapsed: 0.9 s                                                                    total:   0.0 B (0.0 B/s)                                         
WARN[0001] reference for unknown type: application/vnd.fermyon.spin.application.v1+config  digest="sha256:78be88453c2877440926363508920ecd9522085921317796f89a93eca78025e1" mediatype=application/vnd.fermyoghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: downloading    |--------------------------------------|    0.0 B/507.0 B 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
unknown-sha256:78be88453c2877440926363508920ecd9522085921317796f89a93eca78025e1:  done           |++++++++++++++++++++++++++++++++++++++| 
config-sha256:90ef88bcdc5269f95650b8cae0ffb42a62eb1ff8f44e39134fc37b555f06d314:   done           |++++++++++++++++++++++++++++++++++++++| 
unknown-sha256:c88f155ea55d063afaebcb5bd4224934489108abfc799abbf5ea29f900ad8036:  done           |++++++++++++++++++++++++++++++++++++++| 
elapsed: 1.5 s                                                                    total:  507.0  (337.0 B/s)                                       
unpacking linux/amd64 sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45...
done: 4.632936ms
[root@staging-master-01 ~]# ctr run --rm --net-host --runtime io.containerd.spin.v2 ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7 hello-world bogus-arg
libunwind: __unw_add_dynamic_fde: bad fde: FDE is really a CIE

Serving http://0.0.0.0:80
Available Routes:
  hello-world: http://0.0.0.0:80 (wildcard)
chokosabe commented 2 months ago

Its like the nodes are unreachable by whatever is generating the error - hence the need to recreate how the shim is called by what I guess is the RuntimeClass

chokosabe commented 2 months ago

The executor in the scaffolded SpinApp is called:

executor: containerd-shim-spin

Dont know if that might be having an effect.

vdice commented 2 months ago

The executor in the scaffolded SpinApp is called:

executor: containerd-shim-spin

That's fine/correct 👍

Are you able to tail the containerd logs when you create the SpinApp? Any errors/hints? Or is it not logging at all, i.e. not getting invoked?

chokosabe commented 2 months ago

Good idea!

ctr not being called at all - (the dead shim message is from me closing down the hello world).

[root@staging-master-01 containers]# sudo journalctl -u containerd.service -f -n 20
-- Logs begin at Wed 2024-03-27 19:25:26 CET. --
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.195710784+02:00" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.195759311+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.195776018+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.196112452+02:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0} UntrustedWorkloadRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0} Runtimes:map[runc:{Type:io.containerd.runc.v2 Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:true] PrivilegedWithoutHostDevices:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0} spin:{Type:/opt/kwasm/bin/containerd-shim-spin-v2 Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false IgnoreRdtNotEnabledErrors:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate: IPPreference:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:registry.k8s.io/pause:3.6 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true DeviceOwnershipFromSecurityContext:false IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false EnableUnprivilegedPorts:false EnableUnprivilegedICMP:false IgnoreDeprecationWarnings:[] DrainExecSyncIOTimeout:0s} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.196423584+02:00" level=info msg="Connect containerd service"
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.196704787+02:00" level=info msg="Get image filesystem path \"/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs\""
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.197600506+02:00" level=info msg="Start subscribing containerd event"
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.197650820+02:00" level=info msg="Start recovering state"
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.197698683+02:00" level=info msg="Start event monitor"
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.197710762+02:00" level=info msg="Start snapshots syncer"
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.197719777+02:00" level=info msg="Start cni network conf syncer for default"
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.197727561+02:00" level=info msg="Start streaming server"
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.198071907+02:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.198120379+02:00" level=info msg=serving... address=/run/containerd/containerd.sock
Aug 02 18:11:08 staging-master-01 containerd[911]: time="2024-08-02T18:11:08.198516812+02:00" level=info msg="containerd successfully booted in 0.025399s"
Aug 02 18:11:08 staging-master-01 systemd[1]: Started containerd container runtime.
Aug 02 20:28:42 staging-master-01 containerd[911]: time="2024-08-02T18:28:42.989202772Z" level=error msg="listener accept got Custom { kind: Other, error: "listener shutdown for quit flag" }"
Aug 02 20:28:42 staging-master-01 containerd[911]: time="2024-08-02T20:28:42.989282640+02:00" level=info msg="shim disconnected" id=hello-world
Aug 02 20:28:42 staging-master-01 containerd[911]: time="2024-08-02T20:28:42.989622865+02:00" level=warning msg="cleaning up after shim disconnected" id=hello-world namespace=default
Aug 02 20:28:42 staging-master-01 containerd[911]: time="2024-08-02T20:28:42.989637215+02:00" level=info msg="cleaning up dead shim"
chokosabe commented 2 months ago

Think might have found the culprit. The cluster is running rke2. Which seems to have its own instance of containerd running.

trying the helloworld example:

[root@staging-worker-01 ~]# sudo ctr --address /run/k3s/containerd/containerd.sock image pull ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: downloading    |--------------------------------------|    0.0 B/507.0 B 
elapsed: 1.0 s                                                                    total:   0.0 B (0.0 B/s)                                         
WARN[0001] reference for unknown type: application/vnd.fermyon.spin.application.v1+config  digest="sha256:78be88453c2877440926363508920ecd9522085921317796f89a93eca78025e1" mediatype=application/vnd.fermyoghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: done           |++++++++++++++++++++++++++++++++++++++| 
unknown-sha256:78be88453c2877440926363508920ecd9522085921317796f89a93eca78025e1:  done           |++++++++++++++++++++++++++++++++++++++| 
config-sha256:90ef88bcdc5269f95650b8cae0ffb42a62eb1ff8f44e39134fc37b555f06d314:   done           |++++++++++++++++++++++++++++++++++++++| 
unknown-sha256:c88f155ea55d063afaebcb5bd4224934489108abfc799abbf5ea29f900ad8036:  done           |++++++++++++++++++++++++++++++++++++++| 
elapsed: 1.6 s                                                                    total:  242.8  (151.7 KiB/s)                                     
unpacking linux/amd64 sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45...
done: 3.04194ms
[root@staging-worker-01 ~]# sudo ctr --address /run/k3s/containerd/containerd.sock run --rm --net-host --runtime io.containerd.spin.v2 ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7 hello-world bogus-arg
libunwind: __unw_add_dynamic_fde: bad fde: FDE is really a CIE

Serving http://0.0.0.0:80
Available Routes:
  hello-world: http://0.0.0.0:80 (wildcard)

These are the logs - same message we see

[root@staging-worker-01 ~]# tail -f /var/lib/rancher/rke2/agent/containerd/containerd.log

time="2024-08-02T21:09:26.151061416+02:00" level=info msg="shim disconnected" id=hello-world
time="2024-08-02T21:09:26.151095541+02:00" level=warning msg="cleaning up after shim disconnected" id=hello-world namespace=default
time="2024-08-02T21:09:26.151106885+02:00" level=info msg="cleaning up dead shim"
time="2024-08-02T21:09:32.645750023+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:09:32.645827979+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:09:47.645772790+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:09:47.645836611+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:10:00.646150926+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:10:00.646241343+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:10:13.646023477+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:10:13.646101031+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
chokosabe commented 2 months ago

I tried the deploy, using the hello world image. Same error still coming through. But ctr runs the same image fine when called locally.

apiVersion: core.spinoperator.dev/v1alpha1
kind: SpinApp
metadata:
  name: ekko-consumer
spec:
  image: "ghcr.io/spinkube/spin-operator/hello-world:240724-103046-gb84d7"
  executor: containerd-shim-spin 
  replicas: 3
  imagePullSecrets:
    - name: ekko-consumer-registry  
  variables:
    - name: SPIN_VARIABLE_DATABASE_URL
      value: "[MASKED]"
    - name: SPIN_VARIABLE_REDIS_URL
      value: "[MASKED]"
    - name: SPIN_VARIABLE_TIMESCALEDB_URL
      value: "[MASKED]"
$ kubectl apply  -f spinapp.yaml
spinapp.core.spinoperator.dev/ekko-consumer configured
Cleaning up project directory and file based variables
00:00
Job succeeded
 failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:27:54.647416074+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:27:54.647509730+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:28:09.646474312+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:28:09.646557417+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:28:21.646493394+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:28:21.646567522+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:28:35.646184961+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:28:35.646281057+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:28:49.646400790+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:28:49.646486207+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:29:03.646142383+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:29:03.646221188+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:29:17.646482043+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:29:17.646556436+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
vdice commented 2 months ago

Nice find! Okay, so the other containerd process must be using a config.toml from a different location, right? /var/lib/rancher/rke2/agent/etc/containerd/config.toml perhaps? Can you try manually adding the spin shim configuration lines, restarting containerd and then seeing if the SpinApp runs? (Or is that what you already tried in https://github.com/spinkube/spin-operator/issues/289#issuecomment-2266018366?)

It seems like the node-installer script already has logic for rke2 here and here. I wonder where it is breaking down...

chokosabe commented 2 months ago

Nice find! Okay, so the other containerd process must be using a config.toml from a different location, right? /var/lib/rancher/rke2/agent/etc/containerd/config.toml perhaps? Can you try manually adding the spin shim configuration lines, restarting containerd and then seeing if the SpinApp runs? (Or is that what you already tried in #289 (comment)?)

It seems like the node-installer script already has logic for rke2 here and here. I wonder where it is breaking down...

Hi @vdice, yeah #289 was me testing the rke2 ctr binary. Locally it can run the helloworld example. Same as the main containerd setup. Restarted the service and tried again. Results below - basically the same thing.

time="2024-08-02T21:49:03.937340356+02:00" level=info msg="CreateContainer within sandbox \"886c8dc2afde0e5dfc6531f0645accf71ab2bdcaeafef1e74e826c1ef5c420d6\" for &ContainerMetadata{Name:kube-proxy,Attempt:6,} returns container id \"e9c741ccaeea9ebd04d5208c41500bec74c93701af19cad1f51e23ba09c5c7bc\""
time="2024-08-02T21:49:03.937609773+02:00" level=info msg="StartContainer for \"e9c741ccaeea9ebd04d5208c41500bec74c93701af19cad1f51e23ba09c5c7bc\""
time="2024-08-02T21:49:03.999638674+02:00" level=info msg="RemoveContainer for \"a6696dcec863d3e53d359ba55ab4547b38c7c037c4a1dccb08a91fa7737ae01f\""
time="2024-08-02T21:49:04.012151080+02:00" level=info msg="RemoveContainer for \"a6696dcec863d3e53d359ba55ab4547b38c7c037c4a1dccb08a91fa7737ae01f\" returns successfully"
time="2024-08-02T21:49:04.022946253+02:00" level=info msg="StartContainer for \"e9c741ccaeea9ebd04d5208c41500bec74c93701af19cad1f51e23ba09c5c7bc\" returns successfully"
time="2024-08-02T21:49:32.327331121+02:00" level=info msg="CreateContainer within sandbox \"7044d2243d6db1f6a4bef189a58b6a071acd93eb8990d08275902e3971daea33\" for container &ContainerMetadata{Name:node-driver-registrar,Attempt:4,}"
time="2024-08-02T21:49:32.359431593+02:00" level=info msg="CreateContainer within sandbox \"7044d2243d6db1f6a4bef189a58b6a071acd93eb8990d08275902e3971daea33\" for &ContainerMetadata{Name:node-driver-registrar,Attempt:4,} returns container id \"00bcad9a7d25436dc6b27efb8d53a1ba739b3aa46f18ae267951a75148c9ecfd\""
time="2024-08-02T21:49:32.359743566+02:00" level=info msg="StartContainer for \"00bcad9a7d25436dc6b27efb8d53a1ba739b3aa46f18ae267951a75148c9ecfd\""
time="2024-08-02T21:49:32.456488566+02:00" level=info msg="StartContainer for \"00bcad9a7d25436dc6b27efb8d53a1ba739b3aa46f18ae267951a75148c9ecfd\" returns successfully"
time="2024-08-02T21:49:48.931950849+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:49:48.932035271+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:50:01.847975671+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:50:01.848057118+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:50:14.848337116+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:50:14.848420706+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:50:29.848015839+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:50:29.848096692+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:50:41.847840686+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:50:41.847928495+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:50:55.848394775+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:50:55.848473470+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:51:10.848116750+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:51:10.848246638+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:51:25.848616132+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:51:25.848692150+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:51:38.847702010+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:51:38.847778974+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:51:49.848207665+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:51:49.848315689+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:52:04.847800392+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:52:04.847876035+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:52:18.847926752+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:52:18.848008625+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:52:33.849206346+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:52:33.849308272+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:52:47.848118711+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:52:47.848208583+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:52:58.848626743+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:52:58.848695851+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:53:13.848843084+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:53:13.848923195+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"
time="2024-08-02T21:53:26.847729645+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,}"
time="2024-08-02T21:53:26.847801449+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ekko-consumer-5459d8848c-srlsq,Uid:5518130c-4d20-442d-829b-fb3e93e5210f,Namespace:default,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"spin\" is configured"

I think the key to this is why would a ctr call work locally but (assuming its being called properly) it can't be called by the RuntimeClass.

vdice commented 2 months ago

@chokosabe just to triple-check, /var/lib/rancher/rke2/agent/etc/containerd/config.toml (as opposed to /etc/containerd/config.toml) contains the following lines, correct?

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
    runtime_type = "/opt/kwasm/bin/containerd-shim-spin-v2"
chokosabe commented 2 months ago

Yes

chokosabe commented 2 months ago

Going back, I think this might be the issue. The ctr binary managed by rke2 can pull the image down (with an error) but cant seem to run it. First sight it looks like image was corrupted getting pulled down but its not. Might be this bug:

https://github.com/containerd/containerd/issues/10196

When trying to run the (spin) image thats been pulled down, you get the same error:

ctr: mismatched image rootfs and manifest layers

[root@staging-worker-01 ~]# /var/lib/rancher/rke2/bin/ctr image pull ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7: resolving      |--------------------------------------| 
elapsed: 0.4 s                                                       total:   0.0 B (0.0 B/s)                                         
WARN[0000] reference for unknown type: application/vnd.fermyon.spin.application.v1+config  digest="sha256:78be88453c2877440926363508920ecd9522085921317796f89a93eca78025e1" mediatype=application/vnd.fermyon.spin.application.v1+config size=643
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolving      |--------------------------------------| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: exists         |++++++++++++++++++++++++++++++++++++++| 
ghcr.io/spinkube/spin-operator/hello-world:20240724-103046-gb8421d7:              resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45: exists         |++++++++++++++++++++++++++++++++++++++| 
unknown-sha256:78be88453c2877440926363508920ecd9522085921317796f89a93eca78025e1:  done           |++++++++++++++++++++++++++++++++++++++| 
unknown-sha256:c88f155ea55d063afaebcb5bd4224934489108abfc799abbf5ea29f900ad8036:  done           |++++++++++++++++++++++++++++++++++++++| 
config-sha256:90ef88bcdc5269f95650b8cae0ffb42a62eb1ff8f44e39134fc37b555f06d314:   done           |++++++++++++++++++++++++++++++++++++++| 
elapsed: 0.6 s                                                                    total:   0.0 B (0.0 B/s)                                         
unpacking linux/amd64 sha256:7b343459ddd7787ab667e8fbc66e56cced3b22888cf2bfc629bbd6fa174c1d45...
ctr: mismatched image rootfs and manifest layers
rajatjindal commented 2 months ago

I just tried on a fresh ubuntu server, and it seems to have worked for me. Here are the steps I took:

root@ubuntu-4gb-hel1-2:/# k get nodes
NAME                STATUS   ROLES                       AGE   VERSION
ubuntu-4gb-hel1-2   Ready    control-plane,etcd,master   24m   v1.28.11+rke2r1

root@ubuntu-4gb-hel1-2:/# k get spinapps
NAME             READY   DESIRED   EXECUTOR
simple-spinapp   1       1         containerd-shim-spi

root@ubuntu-4gb-hel1-2:/# k get pods
NAME                              READY   STATUS    RESTARTS   AGE
simple-spinapp-69fd8f7459-j9xbq   1/1     Running   0          5m52s

root@ubuntu-4gb-hel1-2:/# k port-forward svc/simple-spinapp 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
Handling connection for 8080
Handling connection for 8080

root@ubuntu-4gb-hel1-2:~# curl http://localhost:8080/hello; echo
Hello world from Spin!

Here is my config.toml from the system:

# File generated by rke2. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "/var/lib/rancher/rke2/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = false
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  sandbox_image = "index.docker.io/rancher/mirrored-pause:3.6"

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".registry]
  config_path = "/var/lib/rancher/rke2/agent/etc/containerd/certs.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
    runtime_type = "/opt/kwasm/bin/containerd-shim-spin-v2"

I am keeping my machine around incase you want to double check any config on this working instance.

rajatjindal commented 2 months ago

I am also going to try with Rocky Linux next, and will report back.

rajatjindal commented 2 months ago

this is strange, it works for me on rocky-linux too. I apologize in advance if I missed any of the required config to reproduce this issue.

[root@rocky-4gb-nbg1-2 /]# k get nodes
NAME               STATUS   ROLES                       AGE   VERSION
rocky-4gb-nbg1-2   Ready    control-plane,etcd,master   10m   v1.28.11+rke2r1

[root@rocky-4gb-nbg1-2 /]# k get spinapps
NAME             READY   DESIRED   EXECUTOR
simple-spinapp   1       1         containerd-shim-spin

[root@rocky-4gb-nbg1-2 /]# k get pods
NAME                              READY   STATUS    RESTARTS   AGE
simple-spinapp-69fd8f7459-g8v64   1/1     Running   0          2m29s

[root@rocky-4gb-nbg1-2 /]# k port-forward svc/simple-spinapp 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
Handling connection for 8080
Handling connection for 8080

[root@rocky-4gb-nbg1-2 ~]# curl http://localhost:8080/hello; echo
Hello world from Spin!

here is what my config.toml looks like:

[root@rocky-4gb-nbg1-2 /]# cat ./var/lib/rancher/rke2/agent/etc/containerd/config.toml
# File generated by rke2. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "/var/lib/rancher/rke2/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = true
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  sandbox_image = "index.docker.io/rancher/mirrored-pause:3.6"

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".registry]
  config_path = "/var/lib/rancher/rke2/agent/etc/containerd/certs.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
    runtime_type = "/opt/kwasm/bin/containerd-shim-spin-v2"
chokosabe commented 2 months ago

this is strange, it works for me on rocky-linux too. I apologize in advance if I missed any of the required config to reproduce this issue.

[root@rocky-4gb-nbg1-2 /]# k get nodes
NAME               STATUS   ROLES                       AGE   VERSION
rocky-4gb-nbg1-2   Ready    control-plane,etcd,master   10m   v1.28.11+rke2r1

[root@rocky-4gb-nbg1-2 /]# k get spinapps
NAME             READY   DESIRED   EXECUTOR
simple-spinapp   1       1         containerd-shim-spin

[root@rocky-4gb-nbg1-2 /]# k get pods
NAME                              READY   STATUS    RESTARTS   AGE
simple-spinapp-69fd8f7459-g8v64   1/1     Running   0          2m29s

[root@rocky-4gb-nbg1-2 /]# k port-forward svc/simple-spinapp 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
Handling connection for 8080
Handling connection for 8080

[root@rocky-4gb-nbg1-2 ~]# curl http://localhost:8080/hello; echo
Hello world from Spin!

here is what my config.toml looks like:

[root@rocky-4gb-nbg1-2 /]# cat ./var/lib/rancher/rke2/agent/etc/containerd/config.toml
# File generated by rke2. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "/var/lib/rancher/rke2/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = true
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  sandbox_image = "index.docker.io/rancher/mirrored-pause:3.6"

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".registry]
  config_path = "/var/lib/rancher/rke2/agent/etc/containerd/certs.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
    runtime_type = "/opt/kwasm/bin/containerd-shim-spin-v2"

Upgrading the version of rke2 on the cluster to try this - thanks. The tests above were using rke2 version 1.25 (installed early last year).

To be clear, in both cases you disabled the RKE2 agent?

Thanks.

rajatjindal commented 2 months ago

To be clear, in both cases you disabled the RKE2 agent?

yes, because I was trying this in the standalone mode. Also happy to jump on a call to compare notes if that is easier.

rajatjindal commented 2 months ago

could you check the containerd version on the system where this is not working? We need a relatively new version of containerd for this to work.

specifically:

> 1.6.26-0 OR > 1.7.7-0

chokosabe commented 2 months ago

Many many thanks for all the help on this. Finally got this to a spot where its now working.

Can confirm the initial issues were down to using an older version of rke2 (v1.25). Fixed by wiping everything and rebuilding using rke2 version 1.28.

There were a couple of hiccups around applying the kwasm helm chart. Had to run it again after installing the Spin Operator helm chart (but this is a separate issue).

The only thing I'd add is The different scripts we currently have to run can probably be condensed into 2 simple helm charts. Also maybe some tips on debugging if things don't work straight away being added to the install instructions.

Again, many thanks.

bacongobbler commented 2 months ago

Thank you for the feedback. That's good information that can help us improve the docs.

RE: consolidation, we have a ticket in the docs repo tracking this: https://github.com/spinkube/documentation/issues/122

If you have specific suggestions on how to troubleshoot this situation, I'll see if I can contribute updates to the QuickStart guide.