Open rupipal opened 2 years ago
@rupipal great to hear you are trying it out. Based on where the panic occurred, it looks like it is having difficulties finding the Akri configuration CRD. Just to double check, did you install Akri and the Controller in the previous step? You can confirm that the configuration CRD has been applied to the cluster via helm with kubectl get crd configurations.akri.sh
. I was able to reproduce the error after deleting the Configuration crd.
Hi @kate-goldenring , Thanks for your reply. Yes, I think I missed that step as a slip.
Now this was my kind cluster (cluster-1) to begin with.
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 5m23s kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 5m23s kube-system etcd-cluster1-control-plane 1/1 Running 0 5m27s kube-system kindnet-87dsk 1/1 Running 0 5m24s kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 5m27s kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 5m27s kube-system kube-proxy-xhwtk 1/1 Running 0 5m24s kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 5m27s local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 5m23s
Upon starting the Krustlet node, this is what I got.
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 18m kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 18m kube-system etcd-cluster1-control-plane 1/1 Running 0 18m kube-system kindnet-87dsk 0/1 CrashLoopBackOff 6 18m kube-system kindnet-pt888 0/1 Registered 0 10m kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 18m kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 18m kube-system kube-proxy-xhwtk 1/1 Running 0 18m kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 18m local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 18m
The Krustlet node was deployed.
$ kubectl get no
NAME STATUS ROLES AGE VERSION
cluster1-control-plane Ready control-plane,master 85m v1.21.1
krustlet-wasi Ready
The Arkr controller gets deployed too.
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE default akri-controller-deployment-776897c88f-464gg 1/1 Running 0 42m kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 63m kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 63m kube-system etcd-cluster1-control-plane 1/1 Running 0 63m kube-system kindnet-87dsk 0/1 CrashLoopBackOff 15 63m kube-system kindnet-pt888 0/1 Registered 0 55m kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 63m kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 63m kube-system kube-proxy-xhwtk 1/1 Running 0 63m kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 63m local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 63m
The gRPC proxy successfully connects with the Akri Agent and the input file seems to be written.
[2021-12-18T16:19:12Z INFO dh_grpc_proxy] gRPC proxy running named as: debugEcho! [2021-12-18T16:19:12Z INFO dh_grpc_proxy] Turning the server on! [2021-12-18T16:19:12Z INFO akri_discovery_utils::registration_client] register_discovery_handler - entered [2021-12-18T16:19:12Z INFO akri_discovery_utils::discovery::server] internal_run_discovery_server - entered [2021-12-18T16:22:29Z INFO dh_grpc_proxy::discovery_handler] Connection established! [2021-12-18T16:22:29Z INFO dh_grpc_proxy::discovery_handler] Input file written: {"descriptions":["foo0"]}
However, besides those two kindnet pods not coming up, the broker Wasm Pod doesn't come up either.
$ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE default akri-controller-deployment-776897c88f-464gg 1/1 Running 0 72m default wasi-debug-echo 0/1 Registered 0 25m kube-system coredns-558bd4d5db-6vgxt 1/1 Running 0 93m kube-system coredns-558bd4d5db-sghvw 1/1 Running 0 93m kube-system etcd-cluster1-control-plane 1/1 Running 0 93m kube-system kindnet-87dsk 0/1 CrashLoopBackOff 21 93m kube-system kindnet-pt888 0/1 Registered 0 86m kube-system kube-apiserver-cluster1-control-plane 1/1 Running 0 93m kube-system kube-controller-manager-cluster1-control-plane 1/1 Running 0 93m kube-system kube-proxy-xhwtk 1/1 Running 0 93m kube-system kube-scheduler-cluster1-control-plane 1/1 Running 0 93m local-path-storage local-path-provisioner-547f784dff-xhjzh 1/1 Running 0 93m
Did spent a lot of time to look out for any missing steps. Here is where I am now :)
Rupinder
Hi @rupipal, that's definitely a lot of progress. Did you deploy the debug echo discovery handler yaml from this step. Your flow above is very descriptive and that step isnt in there, so I just wanted to check. I would at least expect erroring pods, since all that step is doing is deploying a standard Kubernetes Pod.
Hi @kate-goldenring
Yes, defintely. That's what causes (wasi-debug-echo 0/1 Registered 0 25m ) to show up. So I'm trying to figure out what would be the id for
kubectl describe pod krustlet-wasi-akri-debug-echo-
But even at 25m, it doesn't run.
@rupipal do the logs of the agent show any issue creating the device plugins? Maybe an issue around creating a socket? The Agent may need to be run privileged
Hi @kate-goldenring
Here are the logs of the agent. They don't seem to show any issue. I tried with sudo; if I recall correctly, it starts looking for Kubeconfig in the root and that it can't find there.
~$ RUST_LOG=info RUST_BACKTRACE=1 KUBECONFIG=~/.kube/config DISCOVERY_HANDLERS_DIRECTORY=~/akri AGENT_NODE_NAME=krustlet-wasi HOST_CRICTL_PATH=/usr/local/bin/crictl HOST_RUNTIME_ENDPOINT=/usr/local/bin/containerd HOST_IMAGE_ENDPOINT=/usr/local/bin/containerd ./akri/target/release/agent akri.sh Agent start akri.sh KUBERNETES_PORT found ... env_logger::init [2021-12-25T05:14:21Z INFO akri_shared::akri::metrics] starting metrics server on port 8080 at /metrics [2021-12-25T05:14:21Z INFO agent::util::registration] internal_run_registration_server - entered [2021-12-25T05:14:21Z INFO agent::util::config_action] do_config_watch - enter [2021-12-25T05:14:21Z INFO warp::server] Server::run; addr=0.0.0.0:8080 [2021-12-25T05:14:21Z INFO warp::server] listening on http://0.0.0.0:8080 [2021-12-25T05:14:21Z INFO agent::util::config_action] handle_config - watcher started [2021-12-25T05:22:09Z INFO agent::util::registration] register_discovery_handler - called with register request RegisterDiscoveryHandlerRequest { name: "debugEcho", endpoint: "/home/rupinder/akri/debugEcho.sock", endpoint_type: Uds, shared: true } [2021-12-25T05:24:25Z INFO agent::util::config_action] handle_config - added or modified Configuration Some("akri-debug-echo") [2021-12-25T05:24:25Z INFO agent::util::discovery_operator::start_discovery] start_discovery - entered for debugEcho discovery handler
Here are the logs of the gRPC proxy.
~/akri-on-krustlet$ RUST_LOG=info DISCOVERY_HANDLER_NAME=debugEcho DISCOVERY_HANDLERS_DIRECTORY=~/akri AGENT_NODE_NAME=krustlet-wasi ./target/release/dh-grpc-proxy [2021-12-25T05:22:09Z INFO dh_grpc_proxy] gRPC proxy running named as: debugEcho! [2021-12-25T05:22:09Z INFO dh_grpc_proxy] Turning the server on! [2021-12-25T05:22:09Z INFO akri_discovery_utils::registration_client] register_discovery_handler - entered [2021-12-25T05:22:09Z INFO akri_discovery_utils::discovery::server] internal_run_discovery_server - entered [2021-12-25T05:24:25Z INFO dh_grpc_proxy::discovery_handler] Connection established! [2021-12-25T05:24:25Z INFO dh_grpc_proxy::discovery_handler] Input file written: {"descriptions":["foo0"]}
looks like the proxy and agent are running correctly. The Wasm debug echo discovery handler is not correctly reading the input file and writing to the output file. Can you share the logs of the debug echo discovery handler that was deployed in this step?
Hi @kate-goldenring Sorry for the long delay in replying; I was unwell. I re-did all the steps. As I mentioned earlier, wasi debug echo discovery handler pod doesn't start running and isn't ready.
kubectl get akrii,pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/akri-controller-deployment-776897c88f-f9wqh 1/1 Running 0 146m 10.244.0.5 cluster1-control-plane
Commenting here to revive this investigation. I will be unavailable for the next couple weeks but i will see if i can find a slot of time to rerun the demo and possibly repro the issue. @rodz in case you have time to debug
Thanks, Kate.
On Tue, May 10, 2022, 8:07 PM Kate Goldenring @.***> wrote:
Commenting here to revive this investigation. I will be unavailable for the next couple weeks but i will see if i can find a slot of time to rerun the demo and possibly repro the issue. @rodz https://github.com/rodz in case you have time to debug
— Reply to this email directly, view it on GitHub https://github.com/project-akri/akri-on-krustlet/issues/4#issuecomment-1122483561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJIEWFJQHKKOQWBYWD6NADVJJYBNANCNFSM5KI3ZROQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi, Though I'd have liked to check the demo ( https://github.com/project-akri/akri-on-krustlet/blob/main/demo-krustlet.md ) on k3d ( ealier I could install Akri on k3d without any major issues ( https://github.com/project-akri/akri/issues/438 ), I faced errors in installing Kruslet node itself. Maybe that needs to be taken up with Krustlet people. So I switched to kind. I can see the kruslet-wasi node in the cluster. However, I seem to have hit some error.
~/akri$ RUST_LOG=info RUST_BACKTRACE=1 KUBECONFIG=~/.kube/config DISCOVERY_HANDLERS_DIRECTORY=~/akri AGENT_NODE_NAME=krustlet-wasi HOST_CRICTL_PATH=/usr/local/bin/crictl HOST_RUNTIME_ENDPOINT=/usr/local/bin/containerd HOST_IMAGE_ENDPOINT=/usr/local/bin/containerd target/release/agent akri.sh Agent start
akri.sh KUBERNETES_PORT found ... env_logger::init [2021-12-17T13:05:31Z INFO akri_shared::akri::metrics] starting metrics server on port 8080 at /metrics [2021-12-17T13:05:31Z INFO agent::util::registration] internal_run_registration_server - entered [2021-12-17T13:05:31Z INFO agent::util::config_action] do_config_watch - enter [2021-12-17T13:05:31Z INFO warp::server] Server::run; addr=0.0.0.0:8080 [2021-12-17T13:05:31Z INFO warp::server] listening on http://0.0.0.0:8080 [2021-12-17T13:05:31Z WARN kube::client] Unsuccessful data error parse: 404 page not found
thread 'tokio-runtime-worker' panicked at 'called as core::future::future::Future>::poll
4: tokio::runtime::task::harness::Harness<T,S>::poll
5: std::thread::local::LocalKey::with
6: tokio::runtime::thread_pool::worker::Context::run_task
7: tokio::runtime::thread_pool::worker::Context::run
8: tokio::macros::scoped_tls::ScopedKey::set
9: tokio::runtime::thread_pool::worker::run
10: tokio::loom::std::unsafe_cell::UnsafeCell::with_mut
11: tokio::runtime::task::harness::Harness<T,S>::poll
12: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with
Result::unwrap()
on anErr
value: "404 page not found\n": Failed to parse error data', agent/src/main.rs:88:14 stack backtrace: 0: rust_begin_unwind at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/std/src/panicking.rs:517:5 1: core::panicking::panic_fmt at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:101:14 2: core::result::unwrap_failed at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/result.rs:1617:5 3: <core::future::from_generator::GenFutureRUST_BACKTRACE=full
for a verbose backtrace. Error: JoinError::Panic(...)The step previous to this showed: ~/akri$ cargo build -p agent --release Updating git repository
https://github.com/kate-goldenring/h2
Updating git repositoryhttps://github.com/DazWilkin/openapi-admission-v1
Downloaded crypto-mac v0.8.0 Downloaded darling v0.12.4 Downloaded float-cmp v0.8.0 ... ... ...while let
pattern --> discovery-utils/src/discovery/mod.rs:231:27akri-discovery-utils
(lib) generated 1 warning warning: irrefutablewhile let
pattern --> agent/src/util/registration.rs:189:19while let
pattern --> agent/src/util/device_plugin_builder.rs:143:27warning:
agent
(bin "agent") generated 2 warnings Finished release [optimized] target(s) in 1m 46sregards Rupinder