vhive-serverless / vHive

vHive: Open-source framework for serverless experimentation
MIT License
281 stars 86 forks source link

Cannot deploy functions on Master #743

Closed pzg250 closed 1 year ago

pzg250 commented 1 year ago

Describe the bug After install vhive on 2 aws EC2 instances, it cannot deploy functions. Ubuntu 20.04

To Reproduce

  1. On master and worker instances.

  2. git clone --depth=1 https://github.com/vhive-serverless/vhive.git

  3. cd vhive

  4. mkdir -p /tmp/vhive-logs

  5. ./scripts/cloudlab/setup_node.sh stock-only use-stargz > >(tee -a /tmp/vhive-logs/setup_node.stdout) 2> >(tee -a /tmp/vhive-logs/setup_node.stderr >&2)

  6. On the worker instance

  7. ./scripts/cluster/setup_worker_kubelet.sh stock-only > >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stdout) 2> >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stderr >&2)

  8. sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"

  9. sudo PATH=$PATH screen -dmS firecracker bash -c "/usr/local/bin/firecracker-containerd --config /etc/firecracker-containerd/config.toml > >(tee -a /tmp/vhive-logs/firecracker.stdout) 2> >(tee -a /tmp/vhive-logs/firecracker.stderr >&2)"

  10. source /etc/profile && go build

  11. sudo screen -dmS vhive bash -c "./vhive > >(tee -a /tmp/vhive-logs/vhive.stdout) 2> >(tee -a /tmp/vhive-logs/vhive.stderr >&2)"

  12. On the Master instance

  13. sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"

  14. ./scripts/cluster/create_multinode_cluster.sh stock-only > >(tee -a /tmp/vhive-logs/create_multinode_cluster.stdout) 2> >(tee -a /tmp/vhive-logs/create_multinode_cluster.stderr >&2)

  15. On another Master instance terminal

  16. mkdir -p $HOME/.kube

  17. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

  18. sudo chown $(id -u):$(id -g) $HOME/.kube/config

  19. kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

  20. On the worker instance

  21. kubeadm join 172.31.31.170:6443 --token zu75b2.gcq3a7pgf6rt17zz --discovery-token-ca-cert-hash sha256:7ebc70e3c5fd3672183b2ac41c0e0f136dca8cbc77f918c9419931ca4e177dec

  22. On original Master instance terminal

  23. press y

  24. watch kubectl get pods --all-namespaces

  25. On the Master instance terminal

  26. source /etc/profile && pushd ./examples/deployer && go build && popd && ./examples/deployer/deployer

Expected behavior A clear and concise description of what you expected to happen.

Logs step 4 logs

ubuntu@ip-172-31-31-170:~$ mkdir -p $HOME/.kube
ubuntu@ip-172-31-31-170:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp: overwrite '/home/ubuntu/.kube/config'?
ubuntu@ip-172-31-31-170:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
ubuntu@ip-172-31-31-170:~$ kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
namespace/kube-flannel created
serviceaccount/flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

step 5 logs

ubuntu@ip-172-31-16-232:~/vhive$ sudo kubeadm join 172.31.31.170:6443 --token zu75b2.gcq3a7pgf6rt17zz --discovery-token-ca-cert-hash sha256:7ebc70e3c5fd3672183b2ac41c0e0f136dca8cbc77f918c9419931ca4e177dec
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

step 6 logs

ubuntu@ip-172-31-31-170:~/vhive$ kubectl get pods --all-namespaces
NAMESPACE          NAME                                       READY   STATUS                  RESTARTS         AGE
istio-system       cluster-local-gateway-fffb9f589-ttljp      0/1     Running                 0                7h50m
istio-system       istio-ingressgateway-778db64bb6-hms6b      0/1     Running                 0                7h50m
istio-system       istiod-85bf857c79-r79k2                    1/1     Running                 0                7h50m
knative-eventing   eventing-controller-6b5b744bfd-82zjf       0/1     Pending                 0                12m
knative-eventing   eventing-webhook-75cdd7c68-c56mn           1/1     Running                 0                7h44m
knative-eventing   imc-controller-565df566f8-lkg8z            1/1     Running                 0                7h44m
knative-eventing   imc-dispatcher-5bf6c7d945-fmkm5            1/1     Running                 0                7h44m
knative-eventing   mt-broker-controller-575d4c9f77-xtt6x      0/1     Pending                 0                7h44m
knative-eventing   mt-broker-filter-746ddf5785-289pm          0/1     Pending                 0                12m
knative-eventing   mt-broker-ingress-7bff548b5b-v8cdl         0/1     Pending                 0                12m
knative-serving    activator-5cc89f4c4d-b7gk2                 0/1     Running                 87 (87s ago)     7h45m
knative-serving    autoscaler-6fb596f4bb-92n9g                1/1     Running                 0                7h45m
knative-serving    controller-6b5874c54-swbzl                 1/1     Running                 0                7h45m
knative-serving    default-domain-cgssc                       0/1     Error                   0                7h45m
knative-serving    default-domain-z7ctm                       0/1     Pending                 0                7h44m
knative-serving    domain-mapping-5b6c878f85-mdkfh            1/1     Running                 0                7h45m
knative-serving    domainmapping-webhook-59f98dc77b-4ns98     1/1     Running                 0                7h45m
knative-serving    net-istio-controller-777b6b4d89-chlcw      1/1     Running                 0                7h45m
knative-serving    net-istio-webhook-78665d59fd-qcf8f         1/1     Running                 0                7h45m
knative-serving    webhook-79f8449d8f-7mdz5                   1/1     Running                 0                7h45m
kube-flannel       kube-flannel-ds-2tx4v                      0/1     CrashLoopBackOff        96 (3m46s ago)   7h52m
kube-flannel       kube-flannel-ds-6g2qd                      0/1     CrashLoopBackOff        97 (40s ago)     7h52m
kube-system        calico-kube-controllers-567c56ff98-brvs4   1/1     Running                 0                7h51m
kube-system        calico-node-bqctf                          0/1     Running                 0                12m
kube-system        calico-node-wckkn                          0/1     Running                 0                7h51m
kube-system        coredns-565d847f94-fd95b                   1/1     Running                 0                7h56m
kube-system        coredns-565d847f94-rglb6                   1/1     Running                 0                7h56m
kube-system        etcd-ip-172-31-31-170                      1/1     Running                 0                7h56m
kube-system        kube-apiserver-ip-172-31-31-170            1/1     Running                 0                7h56m
kube-system        kube-controller-manager-ip-172-31-31-170   1/1     Running                 0                7h56m
kube-system        kube-proxy-jqqsd                           1/1     Running                 0                7h52m
kube-system        kube-proxy-sqqms                           1/1     Running                 0                7h56m
kube-system        kube-scheduler-ip-172-31-31-170            1/1     Running                 0                7h56m
metallb-system     controller-844979dcdc-fvbqj                1/1     Running                 0                7h51m
metallb-system     speaker-749l5                              1/1     Running                 0                7h51m
metallb-system     speaker-c8f2b                              1/1     Running                 0                7h51m
registry           docker-registry-pod-75gx8                  1/1     Running                 0                7h45m
registry           registry-etc-hosts-update-lh4ll            0/1     Init:CrashLoopBackOff   84 (2m33s ago)   7h45m

step 7 logs

~/vhive/examples/deployer ~/vhive
go: downloading github.com/sirupsen/logrus v1.8.1
go: downloading golang.org/x/sys v0.0.0-20191026070338-33540a1f6037
~/vhive
WARN[0010] Failed to deploy function pyaes-0, configs/knative_workloads/pyaes.yaml: exit status 1
Error: Internal error occurred: failed calling webhook "webhook.serving.knative.dev": failed to call webhook: Post "https://webhook.knative-serving.svc:443/defaulting?timeout=10s": context deadline exceeded
Run 'kn --help' for usage

INFO[0010] Deployed function pyaes-0
WARN[0010] Failed to deploy function helloworld-0, configs/knative_workloads/helloworld.yaml: exit status 1
Error: Internal error occurred: failed calling webhook "webhook.serving.knative.dev": failed to call webhook: Post "https://webhook.knative-serving.svc:443/defaulting?timeout=10s": dial tcp 10.102.16.44:443: i/o timeout
Run 'kn --help' for usage

INFO[0010] Deployed function helloworld-0
WARN[0010] Failed to deploy function pyaes-1, configs/knative_workloads/pyaes.yaml: exit status 1
Error: Internal error occurred: failed calling webhook "webhook.serving.knative.dev": failed to call webhook: Post "https://webhook.knative-serving.svc:443/defaulting?timeout=10s": context deadline exceeded
Run 'kn --help' for usage

INFO[0010] Deployed function pyaes-1
INFO[0010] Deployment finished

Notes Currently, we support only Ubuntu 18 (x86) bare-metal hosts, however we encourage the users to reports Issues that appear in different settings. We will try to help and potentially include these scenarios into our CI if given enough interest from the community.

pzg250 commented 1 year ago

I assume that the failure on step 7 is caused by abnormal status on step 6. Any advice ? thanks in advance!

pzg250 commented 1 year ago

Note: when I run setup_node there is an error. not sure it is the root cause of this issue.

sysctl: setting key "net.ipv4.conf.all.promote_secondaries": Invalid argument
ustiugov commented 1 year ago

@pzg250 thank you for the details. Can you try the quickstart guide, does it work?

You seem to combine incompatible technologies: run setup scripts with the stock-only option, i.e., with containers, and with estargz (works only for containers) but also run firecracker-containerd... then you install flannel even though our scripts install calico

may I ask what you are trying to achieve?

pzg250 commented 1 year ago

Hi @ustiugov , thank you for your response. Yes, will try to reinstall. So that means if I use stock-only, I should run as follow steps, right ?

  1. On both nodes
  2. git clone --depth=1 https://github.com/vhive-serverless/vhive.git
  3. cd vhive
  4. mkdir -p /tmp/vhive-logs
  5. ./scripts/cloudlab/setup_node.sh stock-only use-stargz > >(tee -a /tmp/vhive-logs/setup_node.stdout) 2> >(tee -a /tmp/vhive-logs/setup_node.stderr >&2)
  6. On worker node
  7. ./scripts/cluster/setup_worker_kubelet.sh stock-only > >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stdout) 2> >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stderr >&2)
  8. sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
  9. On master node
  10. sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
  11. ./scripts/cluster/create_multinode_cluster.sh stock-only > >(tee -a /tmp/vhive-logs/create_multinode_cluster.stdout) 2> >(tee -a /tmp/vhive-logs/create_multinode_cluster.stderr >&2)

may I ask what you are trying to achieve? Someone want me to help him setup the vhive ENV, I think he want to run some algorithm by vhive.

pzg250 commented 1 year ago

Hi @ustiugov , thank you for your response. Yes, will try to reinstall. So that means if I use stock-only, I should run as follow steps, right ?

  1. On both nodes
1. git clone --depth=1 https://github.com/vhive-serverless/vhive.git
2. cd vhive
3. mkdir -p /tmp/vhive-logs
4. ./scripts/cloudlab/setup_node.sh stock-only use-stargz > >(tee -a /tmp/vhive-logs/setup_node.stdout) 2> >(tee -a /tmp/vhive-logs/setup_node.stderr >&2)
  1. On worker node
1. ./scripts/cluster/setup_worker_kubelet.sh stock-only > >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stdout) 2> >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stderr >&2)
2. sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
  1. On master node
1. sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
2. ./scripts/cluster/create_multinode_cluster.sh stock-only > >(tee -a /tmp/vhive-logs/create_multinode_cluster.stdout) 2> >(tee -a /tmp/vhive-logs/create_multinode_cluster.stderr >&2)

may I ask what you are trying to achieve? Someone want me to help him setup the vhive ENV, I think he want to run some algorithm by vhive.

Seem it works by following these steps. Thanks @ustiugov , and I met example test error. will open a new ticket for that. Close this.