project-akri / akri-on-krustlet

WebAssembly Systems Interface version for akri
Apache License 2.0
11 stars 5 forks source link

Errors in the Akri-on-Krustlet demo #4

Open rupipal opened 2 years ago

rupipal commented 2 years ago

Hi, Though I'd have liked to check the demo ( https://github.com/project-akri/akri-on-krustlet/blob/main/demo-krustlet.md ) on k3d ( ealier I could install Akri on k3d without any major issues ( https://github.com/project-akri/akri/issues/438 ), I faced errors in installing Kruslet node itself. Maybe that needs to be taken up with Krustlet people. So I switched to kind. I can see the kruslet-wasi node in the cluster. However, I seem to have hit some error.

~/akri$ RUST_LOG=info RUST_BACKTRACE=1 KUBECONFIG=~/.kube/config DISCOVERY_HANDLERS_DIRECTORY=~/akri AGENT_NODE_NAME=krustlet-wasi HOST_CRICTL_PATH=/usr/local/bin/crictl HOST_RUNTIME_ENDPOINT=/usr/local/bin/containerd HOST_IMAGE_ENDPOINT=/usr/local/bin/containerd target/release/agent akri.sh Agent start

akri.sh KUBERNETES_PORT found ... env_logger::init [2021-12-17T13:05:31Z INFO akri_shared::akri::metrics] starting metrics server on port 8080 at /metrics [2021-12-17T13:05:31Z INFO agent::util::registration] internal_run_registration_server - entered [2021-12-17T13:05:31Z INFO agent::util::config_action] do_config_watch - enter [2021-12-17T13:05:31Z INFO warp::server] Server::run; addr=0.0.0.0:8080 [2021-12-17T13:05:31Z INFO warp::server] listening on http://0.0.0.0:8080 [2021-12-17T13:05:31Z WARN kube::client] Unsuccessful data error parse: 404 page not found

thread 'tokio-runtime-worker' panicked at 'called Result::unwrap() on an Err value: "404 page not found\n": Failed to parse error data', agent/src/main.rs:88:14 stack backtrace: 0: rust_begin_unwind at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/std/src/panicking.rs:517:5 1: core::panicking::panic_fmt at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:101:14 2: core::result::unwrap_failed at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/result.rs:1617:5 3: <core::future::from_generator::GenFuture as core::future::future::Future>::poll 4: tokio::runtime::task::harness::Harness<T,S>::poll 5: std::thread::local::LocalKey::with 6: tokio::runtime::thread_pool::worker::Context::run_task 7: tokio::runtime::thread_pool::worker::Context::run 8: tokio::macros::scoped_tls::ScopedKey::set 9: tokio::runtime::thread_pool::worker::run 10: tokio::loom::std::unsafe_cell::UnsafeCell::with_mut 11: tokio::runtime::task::harness::Harness<T,S>::poll 12: tokio::runtime::blocking::pool::Inner::run note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace. Error: JoinError::Panic(...)

The step previous to this showed: ~/akri$ cargo build -p agent --release Updating git repository https://github.com/kate-goldenring/h2 Updating git repository https://github.com/DazWilkin/openapi-admission-v1 Downloaded crypto-mac v0.8.0 Downloaded darling v0.12.4 Downloaded float-cmp v0.8.0 ... ... ...

Compiling kube-runtime v0.59.0 Compiling akri-shared v0.7.11 (/home/rupinder/akri/shared) warning: irrefutable while let pattern --> discovery-utils/src/discovery/mod.rs:231:27 231 while let item = uds.accept().map_ok( (st, _) unix_stream::UnixStream(st)).await { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: `#[warn(irrefutable_let_patterns)]` on by default
= note: this pattern will always match, so the loop will never exit
= help: consider instead using a `loop { ... }` with a `let` inside it
Compiling akri-debug-echo v0.7.11 (/home/rupinder/akri/discovery-handlers/debug-echo) warning: akri-discovery-utils (lib) generated 1 warning warning: irrefutable while let pattern --> agent/src/util/registration.rs:189:19 189 while let item = uds.accept().map_ok( (st, _) unix_stream::UnixStream(st)).await { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: `#[warn(irrefutable_let_patterns)]` on by default
= note: this pattern will always match, so the loop will never exit
= help: consider instead using a `loop { ... }` with a `let` inside it
warning: irrefutable while let pattern --> agent/src/util/device_plugin_builder.rs:143:27 143 while let item = uds.accept().map_ok( (st, _) unix_stream::UnixStream(st)).await { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: this pattern will always match, so the loop will never exit
= help: consider instead using a `loop { ... }` with a `let` inside it

warning: agent (bin "agent") generated 2 warnings Finished release [optimized] target(s) in 1m 46s

regards Rupinder

kate-goldenring commented 2 years ago

@rupipal great to hear you are trying it out. Based on where the panic occurred, it looks like it is having difficulties finding the Akri configuration CRD. Just to double check, did you install Akri and the Controller in the previous step? You can confirm that the configuration CRD has been applied to the cluster via helm with kubectl get crd configurations.akri.sh. I was able to reproduce the error after deleting the Configuration crd.

rupipal commented 2 years ago

Hi @kate-goldenring , Thanks for your reply. Yes, I think I missed that step as a slip.

Now this was my kind cluster (cluster-1) to begin with.

$ kubectl get pods -A

NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 5m23s kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 5m23s kube-system etcd-cluster1-control-plane 1/1 Running 0 5m27s kube-system kindnet-87dsk 1/1 Running 0 5m24s kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 5m27s kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 5m27s kube-system kube-proxy-xhwtk 1/1 Running 0 5m24s kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 5m27s local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 5m23s

Upon starting the Krustlet node, this is what I got.

$ kubectl get pods -A

NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 18m kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 18m kube-system etcd-cluster1-control-plane 1/1 Running 0 18m kube-system kindnet-87dsk 0/1 CrashLoopBackOff 6 18m kube-system kindnet-pt888 0/1 Registered 0 10m kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 18m kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 18m kube-system kube-proxy-xhwtk 1/1 Running 0 18m kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 18m local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 18m

The Krustlet node was deployed.

$ kubectl get no NAME STATUS ROLES AGE VERSION cluster1-control-plane Ready control-plane,master 85m v1.21.1 krustlet-wasi Ready 77m 1.0.0-alpha.1

The Arkr controller gets deployed too.

$ kubectl get pods -A

NAMESPACE NAME READY STATUS RESTARTS AGE default akri-controller-deployment-776897c88f-464gg 1/1 Running 0 42m kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 63m kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 63m kube-system etcd-cluster1-control-plane 1/1 Running 0 63m kube-system kindnet-87dsk 0/1 CrashLoopBackOff 15 63m kube-system kindnet-pt888 0/1 Registered 0 55m kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 63m kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 63m kube-system kube-proxy-xhwtk 1/1 Running 0 63m kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 63m local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 63m

The gRPC proxy successfully connects with the Akri Agent and the input file seems to be written.

[2021-12-18T16:19:12Z INFO dh_grpc_proxy] gRPC proxy running named as: debugEcho! [2021-12-18T16:19:12Z INFO dh_grpc_proxy] Turning the server on! [2021-12-18T16:19:12Z INFO akri_discovery_utils::registration_client] register_discovery_handler - entered [2021-12-18T16:19:12Z INFO akri_discovery_utils::discovery::server] internal_run_discovery_server - entered [2021-12-18T16:22:29Z INFO dh_grpc_proxy::discovery_handler] Connection established! [2021-12-18T16:22:29Z INFO dh_grpc_proxy::discovery_handler] Input file written: {"descriptions":["foo0"]}

However, besides those two kindnet pods not coming up, the broker Wasm Pod doesn't come up either.

$ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE default akri-controller-deployment-776897c88f-464gg 1/1 Running 0 72m default wasi-debug-echo 0/1 Registered 0 25m kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 93m kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 93m kube-system etcd-cluster1-control-plane 1/1 Running 0 93m kube-system kindnet-87dsk 0/1 CrashLoopBackOff 21 93m kube-system kindnet-pt888 0/1 Registered 0 86m kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 93m kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 93m kube-system kube-proxy-xhwtk 1/1 Running 0 93m kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 93m local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 93m

Did spent a lot of time to look out for any missing steps. Here is where I am now :)

Rupinder

kate-goldenring commented 2 years ago

Hi @rupipal, that's definitely a lot of progress. Did you deploy the debug echo discovery handler yaml from this step. Your flow above is very descriptive and that step isnt in there, so I just wanted to check. I would at least expect erroring pods, since all that step is doing is deploying a standard Kubernetes Pod.

rupipal commented 2 years ago

Hi @kate-goldenring

Yes, defintely. That's what causes (wasi-debug-echo 0/1 Registered 0 25m ) to show up. So I'm trying to figure out what would be the id for kubectl describe pod krustlet-wasi-akri-debug-echo--pod

But even at 25m, it doesn't run.

kate-goldenring commented 2 years ago

@rupipal do the logs of the agent show any issue creating the device plugins? Maybe an issue around creating a socket? The Agent may need to be run privileged

rupipal commented 2 years ago

Hi @kate-goldenring

Here are the logs of the agent. They don't seem to show any issue. I tried with sudo; if I recall correctly, it starts looking for Kubeconfig in the root and that it can't find there.

~$ RUST_LOG=info RUST_BACKTRACE=1 KUBECONFIG=~/.kube/config DISCOVERY_HANDLERS_DIRECTORY=~/akri AGENT_NODE_NAME=krustlet-wasi HOST_CRICTL_PATH=/usr/local/bin/crictl HOST_RUNTIME_ENDPOINT=/usr/local/bin/containerd HOST_IMAGE_ENDPOINT=/usr/local/bin/containerd ./akri/target/release/agent akri.sh Agent start akri.sh KUBERNETES_PORT found ... env_logger::init [2021-12-25T05:14:21Z INFO akri_shared::akri::metrics] starting metrics server on port 8080 at /metrics [2021-12-25T05:14:21Z INFO agent::util::registration] internal_run_registration_server - entered [2021-12-25T05:14:21Z INFO agent::util::config_action] do_config_watch - enter [2021-12-25T05:14:21Z INFO warp::server] Server::run; addr=0.0.0.0:8080 [2021-12-25T05:14:21Z INFO warp::server] listening on http://0.0.0.0:8080 [2021-12-25T05:14:21Z INFO agent::util::config_action] handle_config - watcher started [2021-12-25T05:22:09Z INFO agent::util::registration] register_discovery_handler - called with register request RegisterDiscoveryHandlerRequest { name: "debugEcho", endpoint: "/home/rupinder/akri/debugEcho.sock", endpoint_type: Uds, shared: true } [2021-12-25T05:24:25Z INFO agent::util::config_action] handle_config - added or modified Configuration Some("akri-debug-echo") [2021-12-25T05:24:25Z INFO agent::util::discovery_operator::start_discovery] start_discovery - entered for debugEcho discovery handler

Here are the logs of the gRPC proxy.

~/akri-on-krustlet$ RUST_LOG=info DISCOVERY_HANDLER_NAME=debugEcho DISCOVERY_HANDLERS_DIRECTORY=~/akri AGENT_NODE_NAME=krustlet-wasi ./target/release/dh-grpc-proxy [2021-12-25T05:22:09Z INFO dh_grpc_proxy] gRPC proxy running named as: debugEcho! [2021-12-25T05:22:09Z INFO dh_grpc_proxy] Turning the server on! [2021-12-25T05:22:09Z INFO akri_discovery_utils::registration_client] register_discovery_handler - entered [2021-12-25T05:22:09Z INFO akri_discovery_utils::discovery::server] internal_run_discovery_server - entered [2021-12-25T05:24:25Z INFO dh_grpc_proxy::discovery_handler] Connection established! [2021-12-25T05:24:25Z INFO dh_grpc_proxy::discovery_handler] Input file written: {"descriptions":["foo0"]}

kate-goldenring commented 2 years ago

looks like the proxy and agent are running correctly. The Wasm debug echo discovery handler is not correctly reading the input file and writing to the output file. Can you share the logs of the debug echo discovery handler that was deployed in this step?

rupipal commented 2 years ago

Hi @kate-goldenring Sorry for the long delay in replying; I was unwell. I re-did all the steps. As I mentioned earlier, wasi debug echo discovery handler pod doesn't start running and isn't ready.

kubectl get akrii,pods -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/akri-controller-deployment-776897c88f-f9wqh 1/1 Running 0 146m 10.244.0.5 cluster1-control-plane pod/wasi-debug-echo 0/1 Registered 0 118m krustlet-wasi

kate-goldenring commented 2 years ago

Commenting here to revive this investigation. I will be unavailable for the next couple weeks but i will see if i can find a slot of time to rerun the demo and possibly repro the issue. @rodz in case you have time to debug

rupipal commented 2 years ago

Thanks, Kate.

On Tue, May 10, 2022, 8:07 PM Kate Goldenring @.***> wrote:

Commenting here to revive this investigation. I will be unavailable for the next couple weeks but i will see if i can find a slot of time to rerun the demo and possibly repro the issue. @rodz https://github.com/rodz in case you have time to debug

— Reply to this email directly, view it on GitHub https://github.com/project-akri/akri-on-krustlet/issues/4#issuecomment-1122483561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJIEWFJQHKKOQWBYWD6NADVJJYBNANCNFSM5KI3ZROQ . You are receiving this because you were mentioned.Message ID: @.***>