Closed iamhieupc closed 7 months ago
@hustchihieu143 please provide all relevant logs, info about your platform, and steps to reproduce
@hustchihieu143 Is the issue still relevant?
please re-open if still relevant.
I encountered the same issue when I was trying to deploy on CloudLab servers. The commands and output of my setup process are as followed:
Two c220g2
nodes on CloudLab with Intel x86 CPUs
# Clone vHive repo
git clone --depth=1 https://github.com/vhive-serverless/vhive.git
# Create a directory for vHive logs
mkdir -p /tmp/vhive-logs/
# Enter the clone vHive repo
cd vHive
# Install Go via scripts
./scripts/install_go.sh; source /etc/profile
# Get the setup scripts
pushd scripts && go build -o setup_tool && popd && mv scripts/setup_tool .
# Run the standard node setup script (with firecracker)
./setup_tool setup_node firecracker
# Run the script that setups kubelet (with firecracker)
./setup_tool setup_worker_kubelet firecracker
# Start containerd
sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
# Start firecracker-containerd
sudo PATH=$PATH screen -dmS firecracker bash -c "/usr/local/bin/firecracker-containerd --config /etc/firecracker-containerd/config.toml > >(tee -a /tmp/vhive-logs/firecracker.stdout) 2> >(tee -a /tmp/vhive-logs/firecracker.stderr >&2)"
# Build vHive host orchestrator
source /etc/profile && go build
# Start vHive in a background terminal named vhive
sudo screen -dmS vhive bash -c "./vhive > >(tee -a /tmp/vhive-logs/vhive.stdout) 2> >(tee -a /tmp/vhive-logs/vhive.stderr >&2)"
# Start containerd
sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
# Run the script that creates the multinode cluster (with firecracker)
./setup_tool create_multinode_cluster firecracker
# Use the masterKey.yaml to connect worker nodes to the master node
sudo kubeadm join <IP>:<PORT> --token <Token> --discovery-token-ca-cert-hash <Token Hash> > >(tee -a /tmp/vhive-logs/kubeadm_join.stdout) 2> >(tee -a /tmp/vhive-logs/kubeadm_join.stderr >&2)
Output on worker nodes after connection:
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
y
to the prompt in the terminalRunning
or Completed
# Clone vSwarm repo
git clone --depth=1 --branch=v1.0 https://github.com/vhive-serverless/vSwarm ../vswarm
# Enter vSwarm repo
cd ../vswarm
# Change tools/deployer/functions.json to only deploy helloworld
# Deploy helloworld
source /etc/profile && pushd ./tools/deployer && go build && popd && ./tools/deployer/deployer -funcPath ~/vhive/configs/knative_workloads/
The status of helloworld
loops between Error
and CrashLoopBackOff
I also tried to add a GUEST_ADDR
field with value 10.0.1.1
(which is the same as the master IP address in masterKey.yaml
) in configs/knative_workloads/helloworld.yaml
, but the error still lingers, Is there anything I miss to configure?
@JooyoungPark73 please check the following flow when troubleshooting.
the right way to debug is to look at the vHive logs (debug level) to check the first 2 steps. To check the third step, one needs to look at QP container’s logs using kubectl
@lrq619 I wonder why we don't encounter this problem when running the CRI tests. please discuss with JooYoung
@JooyoungPark73 please check the following flow when troubleshooting.
- the vHive process uses the networking module to create a virtual tap device and assign an IP address to it.
- the vHive process sets GUEST_ADDR env variable inside the pod of the created instance
- the QP container retrieves the env var’s value from its shell environment and passes it to the CreateVM call of the cri/ctriface functions
the right way to debug is to look at the vHive logs (debug level) to check the first 2 steps. To check the third step, one needs to look at QP container’s logs using kubectl
@lrq619 I wonder why we don't encounter this problem when running the CRI tests. please discuss with JooYoung
Discussed with JooYoung, I think the difference might be caused by the fact that our CRI tests runs on a single-node cluster while the issue here is on a 2-node cluster.
time="2024-01-26T02:32:27.590352081-07:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:helloworld-0-00001-deployment-77665cb678-5wphk,Uid:438db12e-dd02-4c51-aa2c-5db15259fab1,Namespace:default,Attempt:0,}"
2024-01-26 02:32:27.662 [INFO][206026] plugin.go 327: Calico CNI found existing endpoint: &{{WorkloadEndpoint projectcalico.org/v3} {node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0 helloworld-0-00001-deployment-77665cb678- default 438db12e-dd02-4c51-aa2c-5db15259fab1 92722 0 2024-01-26 02:32:27 -0700 MST <nil> <nil> map[app:helloworld-0-00001 pod-template-hash:77665cb678 projectcalico.org/namespace:default projectcalico.org/orchestrator:k8s projectcalico.org/serviceaccount:default service.istio.io/canonical-name:helloworld-0 service.istio.io/canonical-revision:helloworld-0-00001 serving.knative.dev/configuration:helloworld-0 serving.knative.dev/configurationGeneration:1 serving.knative.dev/configurationUID:abd26e43-5d72-4397-add2-7b00450639b1 serving.knative.dev/revision:helloworld-0-00001 serving.knative.dev/revisionUID:0f8f90c4-6ae8-4e7e-8c3f-1188e54aa91e serving.knative.dev/service:helloworld-0 serving.knative.dev/serviceUID:0cfd9405-13f5-4607-8038-cf8db48df7ff] map[] [] [] []} {k8s node-1.issue-694.ntu-cloud.emulab.net helloworld-0-00001-deployment-77665cb678-5wphk eth0 default [] [] [kns.default ksa.default.default] cali327f1fc2001 [{user-port TCP 50051 0 } {http-queueadm TCP 8022 0 } {http-autometric TCP 9090 0 } {http-usermetric TCP 9091 0 } {queue-port TCP 8013 0 } {https-port TCP 8112 0 }] []}} ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Namespace="default" Pod="helloworld-0-00001-deployment-77665cb678-5wphk" WorkloadEndpoint="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-"
2024-01-26 02:32:27.662 [INFO][206026] k8s.go 74: Extracted identifiers for CmdAddK8s ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Namespace="default" Pod="helloworld-0-00001-deployment-77665cb678-5wphk" WorkloadEndpoint="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0"
2024-01-26 02:32:27.713 [INFO][206055] ipam_plugin.go 229: Calico CNI IPAM request count IPv4=1 IPv6=0 ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" HandleID="k8s-pod-network.e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Workload="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0"
2024-01-26 02:32:27.731 [INFO][206055] ipam_plugin.go 269: Auto assigning IP ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" HandleID="k8s-pod-network.e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Workload="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0" assignArgs=ipam.AutoAssignArgs{Num4:1, Num6:0, HandleID:(*string)(0xc000024550), Attrs:map[string]string{"namespace":"default", "node":"node-1.issue-694.ntu-cloud.emulab.net", "pod":"helloworld-0-00001-deployment-77665cb678-5wphk", "timestamp":"2024-01-26 09:32:27.713382481 +0000 UTC"}, Hostname:"node-1.issue-694.ntu-cloud.emulab.net", IPv4Pools:[]net.IPNet{}, IPv6Pools:[]net.IPNet{}, MaxBlocksPerHost:0, HostReservedAttrIPv4s:(*ipam.HostReservedAttr)(nil), HostReservedAttrIPv6s:(*ipam.HostReservedAttr)(nil), IntendedUse:"Workload"}
time="2024-01-26T02:32:27-07:00" level=info msg="About to acquire host-wide IPAM lock." source="ipam_plugin.go:357"
time="2024-01-26T02:32:27-07:00" level=info msg="Acquired host-wide IPAM lock." source="ipam_plugin.go:372"
2024-01-26 02:32:27.731 [INFO][206055] ipam.go 107: Auto-assign 1 ipv4, 0 ipv6 addrs for host 'node-1.issue-694.ntu-cloud.emulab.net'
2024-01-26 02:32:27.734 [INFO][206055] ipam.go 660: Looking up existing affinities for host handle="k8s-pod-network.e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" host="node-1.issue-694.ntu-cloud.emulab.net"
2024-01-26 02:32:27.741 [INFO][206055] ipam.go 372: Looking up existing affinities for host host="node-1.issue-694.ntu-cloud.emulab.net"
2024-01-26 02:32:27.748 [INFO][206055] ipam.go 489: Trying affinity for 192.168.222.64/26 host="node-1.issue-694.ntu-cloud.emulab.net"
2024-01-26 02:32:27.751 [INFO][206055] ipam.go 155: Attempting to load block cidr=192.168.222.64/26 host="node-1.issue-694.ntu-cloud.emulab.net"
2024-01-26 02:32:27.757 [INFO][206055] ipam.go 232: Affinity is confirmed and block has been loaded cidr=192.168.222.64/26 host="node-1.issue-694.ntu-cloud.emulab.net"
2024-01-26 02:32:27.757 [INFO][206055] ipam.go 1180: Attempting to assign 1 addresses from block block=192.168.222.64/26 handle="k8s-pod-network.e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" host="node-1.issue-694.ntu-cloud.emulab.net"
2024-01-26 02:32:27.760 [INFO][206055] ipam.go 1682: Creating new handle: k8s-pod-network.e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e
2024-01-26 02:32:27.766 [INFO][206055] ipam.go 1203: Writing block in order to claim IPs block=192.168.222.64/26 handle="k8s-pod-network.e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" host="node-1.issue-694.ntu-cloud.emulab.net"
2024-01-26 02:32:27.780 [INFO][206055] ipam.go 1216: Successfully claimed IPs: [192.168.222.101/26] block=192.168.222.64/26 handle="k8s-pod-network.e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" host="node-1.issue-694.ntu-cloud.emulab.net"
2024-01-26 02:32:27.781 [INFO][206055] ipam.go 847: Auto-assigned 1 out of 1 IPv4s: [192.168.222.101/26] handle="k8s-pod-network.e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" host="node-1.issue-694.ntu-cloud.emulab.net"
time="2024-01-26T02:32:27-07:00" level=info msg="Released host-wide IPAM lock." source="ipam_plugin.go:378"
2024-01-26 02:32:27.781 [INFO][206055] ipam_plugin.go 287: Calico CNI IPAM assigned addresses IPv4=[192.168.222.101/26] IPv6=[] ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" HandleID="k8s-pod-network.e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Workload="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0"
2024-01-26 02:32:27.785 [INFO][206026] ="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Namespace="default" Pod="helloworld-0-00001-deployment-77665cb678-5wphk" WorkloadEndpoint="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0" endpoint=&v3.WorkloadEndpoint{TypeMeta:v1.TypeMeta{Kind:"WorkloadEndpoint", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0", GenerateName:"helloworld-0-00001-deployment-77665cb678-", Namespace:"default", SelfLink:"", UID:"438db12e-dd02-4c51-aa2c-5db15259fab1", ResourceVersion:"92722", Generation:0, CreationTimestamp:time.Date(2024, time.January, 26, 2, 32, 27, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"app":"helloworld-0-00001", "pod-template-hash":"77665cb678", "projectcalico.org/namespace":"default", "projectcalico.org/orchestrator":"k8s", "projectcalico.org/serviceaccount":"default", "service.istio.io/canonical-name":"helloworld-0", "service.istio.io/canonical-revision":"helloworld-0-00001", "serving.knative.dev/configuration":"helloworld-0", "serving.knative.dev/configurationGeneration":"1", "serving.knative.dev/configurationUID":"abd26e43-5d72-4397-add2-7b00450639b1", "serving.knative.dev/revision":"helloworld-0-00001", "serving.knative.dev/revisionUID":"0f8f90c4-6ae8-4e7e-8c3f-1188e54aa91e", "serving.knative.dev/service":"helloworld-0", "serving.knative.dev/serviceUID":"0cfd9405-13f5-4607-8038-cf8db48df7ff"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v3.WorkloadEndpointSpec{Orchestrator:"k8s", Workload:"", Node:"node-1.issue-694.ntu-cloud.emulab.net", ContainerID:"", Pod:"helloworld-0-00001-deployment-77665cb678-5wphk", Endpoint:"eth0", ServiceAccountName:"default", IPNetworks:[]string{"192.168.222.101/32"}, IPNATs:[]v3.IPNAT(nil), IPv4Gateway:"", IPv6Gateway:"", Profiles:[]string{"kns.default", "ksa.default.default"}, InterfaceName:"cali327f1fc2001", MAC:"", Ports:[]v3.WorkloadEndpointPort{v3.WorkloadEndpointPort{Name:"user-port", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0xc383, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"http-queueadm", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x1f56, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"http-autometric", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x2382, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"http-usermetric", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x2383, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"queue-port", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x1f4d, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"https-port", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x1fb0, HostPort:0x0, HostIP:""}}, AllowSpoofedSourcePrefixes:[]string(nil)}}
2024-01-26 02:32:27.785 [INFO][206026] k8s.go 384: Calico CNI using IPs: [192.168.222.101/32] ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Namespace="default" Pod="helloworld-0-00001-deployment-77665cb678-5wphk" WorkloadEndpoint="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0"
2024-01-26 02:32:27.785 [INFO][206026] dataplane_linux.go 68: Setting the host side veth name to cali327f1fc2001 ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Namespace="default" Pod="helloworld-0-00001-deployment-77665cb678-5wphk" WorkloadEndpoint="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0"
2024-01-26 02:32:27.788 [INFO][206026] dataplane_linux.go 473: Disabling IPv4 forwarding ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Namespace="default" Pod="helloworld-0-00001-deployment-77665cb678-5wphk" WorkloadEndpoint="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0"
2024-01-26 02:32:27.833 [INFO][206026] k8s.go 411: Added Mac, interface name, and active container ID to endpoint ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Namespace="default" Pod="helloworld-0-00001-deployment-77665cb678-5wphk" WorkloadEndpoint="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0" endpoint=&v3.WorkloadEndpoint{TypeMeta:v1.TypeMeta{Kind:"WorkloadEndpoint", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0", GenerateName:"helloworld-0-00001-deployment-77665cb678-", Namespace:"default", SelfLink:"", UID:"438db12e-dd02-4c51-aa2c-5db15259fab1", ResourceVersion:"92722", Generation:0, CreationTimestamp:time.Date(2024, time.January, 26, 2, 32, 27, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"app":"helloworld-0-00001", "pod-template-hash":"77665cb678", "projectcalico.org/namespace":"default", "projectcalico.org/orchestrator":"k8s", "projectcalico.org/serviceaccount":"default", "service.istio.io/canonical-name":"helloworld-0", "service.istio.io/canonical-revision":"helloworld-0-00001", "serving.knative.dev/configuration":"helloworld-0", "serving.knative.dev/configurationGeneration":"1", "serving.knative.dev/configurationUID":"abd26e43-5d72-4397-add2-7b00450639b1", "serving.knative.dev/revision":"helloworld-0-00001", "serving.knative.dev/revisionUID":"0f8f90c4-6ae8-4e7e-8c3f-1188e54aa91e", "serving.knative.dev/service":"helloworld-0", "serving.knative.dev/serviceUID":"0cfd9405-13f5-4607-8038-cf8db48df7ff"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v3.WorkloadEndpointSpec{Orchestrator:"k8s", Workload:"", Node:"node-1.issue-694.ntu-cloud.emulab.net", ContainerID:"e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e", Pod:"helloworld-0-00001-deployment-77665cb678-5wphk", Endpoint:"eth0", ServiceAccountName:"default", IPNetworks:[]string{"192.168.222.101/32"}, IPNATs:[]v3.IPNAT(nil), IPv4Gateway:"", IPv6Gateway:"", Profiles:[]string{"kns.default", "ksa.default.default"}, InterfaceName:"cali327f1fc2001", MAC:"ae:08:39:ad:b4:e0", Ports:[]v3.WorkloadEndpointPort{v3.WorkloadEndpointPort{Name:"user-port", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0xc383, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"http-queueadm", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x1f56, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"http-autometric", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x2382, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"http-usermetric", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x2383, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"queue-port", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x1f4d, HostPort:0x0, HostIP:""}, v3.WorkloadEndpointPort{Name:"https-port", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x1fb0, HostPort:0x0, HostIP:""}}, AllowSpoofedSourcePrefixes:[]string(nil)}}
2024-01-26 02:32:27.850 [INFO][206026] k8s.go 489: Wrote updated endpoint to datastore ContainerID="e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e" Namespace="default" Pod="helloworld-0-00001-deployment-77665cb678-5wphk" WorkloadEndpoint="node--1.issue--694.ntu--cloud.emulab.net-k8s-helloworld--0--00001--deployment--77665cb678--5wphk-eth0"
time="2024-01-26T02:32:27.873474937-07:00" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2024-01-26T02:32:27.873638893-07:00" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2024-01-26T02:32:27.873673567-07:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-01-26T02:32:27.874027820-07:00" level=info msg="starting signal loop" namespace=k8s.io path=/run/containerd/io.containerd.runtime.v2.task/k8s.io/e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e pid=206102 runtime=io.containerd.runc.v2
time="2024-01-26T02:32:27.980242536-07:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:helloworld-0-00001-deployment-77665cb678-5wphk,Uid:438db12e-dd02-4c51-aa2c-5db15259fab1,Namespace:default,Attempt:0,} returns sandbox id \"e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e\""
time="2024-01-26T02:32:27.984153292-07:00" level=info msg="CreateContainer within sandbox \"e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e\" for container &ContainerMetadata{Name:user-container,Attempt:0,}"
time="2024-01-26T02:32:27.994925994-07:00" level=info msg="CreateContainer within sandbox \"e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e\" for &ContainerMetadata{Name:user-container,Attempt:0,} returns container id \"49f3d6b7249fbe056887cf7a9698f70c27340cd7fef4b5686d76fe69fac93f31\""
time="2024-01-26T02:32:27.995505272-07:00" level=info msg="StartContainer for \"49f3d6b7249fbe056887cf7a9698f70c27340cd7fef4b5686d76fe69fac93f31\""
time="2024-01-26T02:32:28.091302057-07:00" level=info msg="StartContainer for \"49f3d6b7249fbe056887cf7a9698f70c27340cd7fef4b5686d76fe69fac93f31\" returns successfully"
time="2024-01-26T02:32:28.095419461-07:00" level=info msg="CreateContainer within sandbox \"e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e\" for container &ContainerMetadata{Name:queue-proxy,Attempt:0,}"
time="2024-01-26T02:32:28.105626233-07:00" level=info msg="CreateContainer within sandbox \"e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e\" for &ContainerMetadata{Name:queue-proxy,Attempt:0,} returns container id \"e16be82b8c93bc0fcac6d948e78d78e63ed851709b0c7b9ca77029a3ac2d21e2\""
time="2024-01-26T02:32:28.106137072-07:00" level=info msg="StartContainer for \"e16be82b8c93bc0fcac6d948e78d78e63ed851709b0c7b9ca77029a3ac2d21e2\""
time="2024-01-26T02:32:28.206664344-07:00" level=info msg="StartContainer for \"e16be82b8c93bc0fcac6d948e78d78e63ed851709b0c7b9ca77029a3ac2d21e2\" returns successfully"
time="2024-01-26T02:32:28.258634771-07:00" level=info msg="shim disconnected" id=e16be82b8c93bc0fcac6d948e78d78e63ed851709b0c7b9ca77029a3ac2d21e2
time="2024-01-26T02:32:28.258716686-07:00" level=warning msg="cleaning up after shim disconnected" id=e16be82b8c93bc0fcac6d948e78d78e63ed851709b0c7b9ca77029a3ac2d21e2 namespace=k8s.io
time="2024-01-26T02:32:28.258738468-07:00" level=info msg="cleaning up dead shim"
time="2024-01-26T02:32:28.274822512-07:00" level=warning msg="cleanup warnings time=\"2024-01-26T02:32:28-07:00\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=206230 runtime=io.containerd.runc.v2\n"
time="2024-01-26T02:32:28.848673392-07:00" level=info msg="CreateContainer within sandbox \"e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e\" for container &ContainerMetadata{Name:queue-proxy,Attempt:1,}"
time="2024-01-26T02:32:28.857456924-07:00" level=info msg="CreateContainer within sandbox \"e097406be795cf36ab808b9fcc1c7631bd864dcfe86a3ae2efca6d187742474e\" for &ContainerMetadata{Name:queue-proxy,Attempt:1,} returns container id \"2df92d746d6399be2146aae7aa995d3d9888910ee2fa9261bca7ac3608e4378d\""
time="2024-01-26T02:32:28.858022175-07:00" level=info msg="StartContainer for \"2df92d746d6399be2146aae7aa995d3d9888910ee2fa9261bca7ac3608e4378d\""
time="2024-01-26T02:32:28.946321148-07:00" level=info msg="StartContainer for \"2df92d746d6399be2146aae7aa995d3d9888910ee2fa9261bca7ac3608e4378d\" returns successfully"
time="2024-01-26T02:32:28.986701354-07:00" level=info msg="shim disconnected" id=2df92d746d6399be2146aae7aa995d3d9888910ee2fa9261bca7ac3608e4378d
time="2024-01-26T02:32:28.986783309-07:00" level=warning msg="cleaning up after shim disconnected" id=2df92d746d6399be2146aae7aa995d3d9888910ee2fa9261bca7ac3608e4378d namespace=k8s.io
time="2024-01-26T02:32:28.986804597-07:00" level=info msg="cleaning up dead shim"
time="2024-01-26T02:32:29.002706430-07:00" level=warning msg="cleanup warnings time=\"2024-01-26T02:32:28-07:00\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=206324 runtime=io.containerd.runc.v2\n"
time="2024-01-26T02:32:29.853389244-07:00" level=info msg="RemoveContainer for \"e16be82b8c93bc0fcac6d948e78d78e63ed851709b0c7b9ca77029a3ac2d21e2\""
time="2024-01-26T02:32:29.857671513-07:00" level=info msg="RemoveContainer for \"e16be82b8c93bc0fcac6d948e78d78e63ed851709b0c7b9ca77029a3ac2d21e2\" returns successfully"
No logs for vHive and Firecracker
I think I found part 1,
ipam.go 1216: Successfully claimed IPs: [192.168.222.101/26] ...
ipam.go 847: Auto-assigned 1 out of 1 IPv4s: [192.168.222.101/26] ...
...
k8s.go 383: Populated endpoint ContainerID= ...
k8s.go 411: Added Mac, interface name, and active container ID to endpoint ContainerID= ...
But did not find step 2. and then the log goes:
level=info msg="StartContainer for [Proxy Container] returns successfully"
level=info msg="shim disconnected" id=[Proxy Container]
level=warning msg="cleaning up after shim disconnected" id=[Proxy Container] namespace=k8s.io
I changed container names for readability.
After executing:
sudo mv /var/lib/firecracker-containerd/runtime/vmlinux-5.10.186 /var/lib/firecracker-containerd/runtime/hello-vmlinux.bin
sudo chmod +x /var/lib/firecracker-containerd/runtime/hello-vmlinux.bin
The workflow on single cluster node works properly, below are the logs for vhive:
INFO[2024-01-26T06:29:43.765891652-06:00] HEARTBEAT: number of active VMs: 0
INFO[2024-01-26T06:29:43.910730096-06:00] FuncPool heartbeat: ==== Stats by cold functions ====
fID, #started, #served
===================================
INFO[2024-01-26T06:30:43.765670957-06:00] HEARTBEAT: number of active VMs: 0
INFO[2024-01-26T06:30:43.911008195-06:00] FuncPool heartbeat: ==== Stats by cold functions ====
fID, #started, #served
===================================
INFO[2024-01-26T06:31:43.766150454-06:00] HEARTBEAT: number of active VMs: 0
INFO[2024-01-26T06:31:43.910439769-06:00] FuncPool heartbeat: ==== Stats by cold functions ====
fID, #started, #served
===================================
INFO[2024-01-26T06:32:43.765935983-06:00] HEARTBEAT: number of active VMs: 0
INFO[2024-01-26T06:32:43.911296042-06:00] FuncPool heartbeat: ==== Stats by cold functions ====
fID, #started, #served
===================================
INFO[2024-01-26T06:33:43.765688754-06:00] HEARTBEAT: number of active VMs: 0
INFO[2024-01-26T06:33:43.910918952-06:00] FuncPool heartbeat: ==== Stats by cold functions ====
fID, #started, #served
===================================
INFO[2024-01-26T06:34:43.765625713-06:00] HEARTBEAT: number of active VMs: 0
INFO[2024-01-26T06:34:43.910639986-06:00] FuncPool heartbeat: ==== Stats by cold functions ====
fID, #started, #served
===================================
@CuriousGeorgiy Can you take a look at this issue, and give some comments? Maybe some pointers?
@JooyoungPark73 as far as I remember, we did test the multinode setup when I was working on #816.
Regarding https://github.com/vhive-serverless/vHive/issues/694#issuecomment-1912005945:
I have also had problems with the basic kernel binary (hello-vmlinux
) when creating a fresh default-rootfs
image for SSH experiments, so it's definitely worth updating.
AFAIC, GUEST_ADDR
is set here:
https://github.com/vhive-serverless/vHive/blob/7e82cb132bca2b61b8bd57e2466c604cc6df6078/cri/firecracker/service.go#L163-L168
Not sure what could go wrong here, but I would add a log message here to make sure all the environment variables are present and trace the root cause.
Other than that I haven't really worked a lot with or debugged the k8s setup (since I almost always had setup issues in cloudlab), so I can't really help you out here.
@JooyoungPark73 we need to give this fix a priority now
when i deploy service by file yaml: apiVersion: serving.knative.dev/v1 kind: Service metadata: name: aes-python namespace: default spec: template: spec: containers: