vhive-serverless / vHive

vHive: Open-source framework for serverless experimentation
MIT License
290 stars 90 forks source link

Unable to deploy example functions [Error: failed calling webhook "webhook.serving.knative.dev"] #749

Closed Farrrrland closed 1 year ago

Farrrrland commented 1 year ago

Describe the bug Hi, I'm setting up vHive cluster following the vHive Quickstart Guide. I was using the provided vhive-ubuntu20 profile on CloudLab and started the cluster successfully.

However, the deployment of example functions in part 4 pyaes-[0|1] and helloworld-0 failed. Which is unexpected.

To Reproduce

  1. Use exactly the 2-node cluster(1 master and 1 worker) as provided on CloudLab.
  2. Follow the instructions in vHive Quickstart Guide
  3. Changed install_stock.sh as suggested in Issue #742 to avoid errors with gpg keys.

Expected behavior Correctly deployed functions

Logs

Farland@node-0:~/vhive$ source /etc/profile && pushd ./examples/deployer && go build && popd && ./examples/deployer/deployer
~/vhive/examples/deployer ~/vhive
go: downloading github.com/sirupsen/logrus v1.8.1
go: downloading golang.org/x/sys v0.0.0-20191026070338-33540a1f6037
~/vhive
WARN[0010] Failed to deploy function pyaes-0, configs/knative_workloads/pyaes.yaml: exit status 1
Error: Internal error occurred: failed calling webhook "webhook.serving.knative.dev": failed to call webhook: Post "https://webhook.knative-serving.svc:443/defaulting?timeout=10s": context deadline exceeded
Run 'kn --help' for usage

INFO[0010] Deployed function pyaes-0
WARN[0010] Failed to deploy function helloworld-0, configs/knative_workloads/helloworld.yaml: exit status 1
Error: Internal error occurred: failed calling webhook "webhook.serving.knative.dev": failed to call webhook: Post "https://webhook.knative-serving.svc:443/defaulting?timeout=10s": context deadline exceeded
Run 'kn --help' for usage

INFO[0010] Deployed function helloworld-0
WARN[0010] Failed to deploy function pyaes-1, configs/knative_workloads/pyaes.yaml: exit status 1
Error: Internal error occurred: failed calling webhook "webhook.serving.knative.dev": failed to call webhook: Post "https://webhook.knative-serving.svc:443/defaulting?timeout=10s": context deadline exceeded
Run 'kn --help' for usage

INFO[0010] Deployed function pyaes-1
INFO[0010] Deployment finished
Farland@node-0:~/vhive$ kubectl get pods --all-namespaces
NAMESPACE          NAME                                                                    READY   STATUS    RESTARTS   AGE
istio-system       cluster-local-gateway-fffb9f589-d5tst                                   1/1     Running   0          12m
istio-system       istio-ingressgateway-778db64bb6-qbms4                                   1/1     Running   0          12m
istio-system       istiod-85bf857c79-jr6jr                                                 1/1     Running   0          12m
knative-eventing   eventing-controller-6b5b744bfd-9vz4n                                    1/1     Running   0          11m
knative-eventing   eventing-webhook-75cdd7c68-zw6hk                                        1/1     Running   0          11m
knative-eventing   imc-controller-565df566f8-8874q                                         1/1     Running   0          11m
knative-eventing   imc-dispatcher-5bf6c7d945-cdtbb                                         1/1     Running   0          11m
knative-eventing   mt-broker-controller-575d4c9f77-t4wjk                                   1/1     Running   0          11m
knative-eventing   mt-broker-filter-746ddf5785-9nctd                                       1/1     Running   0          11m
knative-eventing   mt-broker-ingress-7bff548b5b-7hb4g                                      1/1     Running   0          11m
knative-serving    activator-64fd97c6bd-t8tqk                                              1/1     Running   0          12m
knative-serving    autoscaler-78bd654674-t78mm                                             1/1     Running   0          12m
knative-serving    controller-67fbfcfc76-cm9mh                                             1/1     Running   0          12m
knative-serving    default-domain-2phtb                                                    0/1     Error     0          10m
knative-serving    default-domain-5mbzp                                                    0/1     Error     0          10m
knative-serving    default-domain-b62nw                                                    0/1     Error     0          11m
knative-serving    default-domain-dl9c4                                                    0/1     Error     0          11m
knative-serving    default-domain-dtfhk                                                    0/1     Error     0          12m
knative-serving    default-domain-snl7k                                                    0/1     Error     0          10m
knative-serving    default-domain-tj9wp                                                    0/1     Error     0          10m
knative-serving    default-domain-tjgzc                                                    0/1     Error     0          11m
knative-serving    default-domain-wrh9h                                                    0/1     Error     0          10m
knative-serving    default-domain-wtqkn                                                    0/1     Error     0          9m46s
knative-serving    default-domain-zmxbq                                                    0/1     Error     0          11m
knative-serving    domain-mapping-874f6d4d8-sq6wm                                          1/1     Running   0          12m
knative-serving    domainmapping-webhook-67f5d487b7-d6rsk                                  1/1     Running   0          12m
knative-serving    net-istio-controller-777b6b4d89-tdww8                                   1/1     Running   0          11m
knative-serving    net-istio-webhook-78665d59fd-dmc7q                                      1/1     Running   0          11m
knative-serving    webhook-9bbf89ffb-9thzb                                                 1/1     Running   0          12m
kube-system        calico-kube-controllers-567c56ff98-f7grj                                1/1     Running   0          13m
kube-system        calico-node-95gs7                                                       0/1     Running   0          13m
kube-system        calico-node-9lrj2                                                       0/1     Running   0          13m
kube-system        coredns-565d847f94-q8p2x                                                1/1     Running   0          18m
kube-system        coredns-565d847f94-z7vjg                                                1/1     Running   0          18m
kube-system        etcd-node-0.vhive-c.auto-faas-pg0.utah.cloudlab.us                      1/1     Running   0          18m
kube-system        kube-apiserver-node-0.vhive-c.auto-faas-pg0.utah.cloudlab.us            1/1     Running   0          18m
kube-system        kube-controller-manager-node-0.vhive-c.auto-faas-pg0.utah.cloudlab.us   1/1     Running   0          18m
kube-system        kube-proxy-gls7m                                                        1/1     Running   0          15m
kube-system        kube-proxy-n5j2c                                                        1/1     Running   0          18m
kube-system        kube-scheduler-node-0.vhive-c.auto-faas-pg0.utah.cloudlab.us            1/1     Running   0          18m
metallb-system     controller-844979dcdc-7cjnv                                             1/1     Running   0          13m
metallb-system     speaker-7t2gw                                                           1/1     Running   0          13m
metallb-system     speaker-rdr79                                                           1/1     Running   0          13m
registry           docker-registry-pod-7mcsh                                               1/1     Running   0          12m
registry           registry-etc-hosts-update-tmk95                                         1/1     Running   0          12m

There also a bunch of knative-serving default-domain-xxxx that seems to have error, but I'm not sure why. I wonder if I missed some configurations. But since I'm using the provided profile I think that may not be a problem.

Notes Currently, we support only Ubuntu 18 (x86) bare-metal hosts, however we encourage the users to reports Issues that appear in different settings. We will try to help and potentially include these scenarios into our CI if given enough interest from the community.

Farrrrland commented 1 year ago

Seems MetalLB was not correctlly installed,

kubectl apply -f $ROOT/configs/metallb/metallb-ipaddresspool.yaml
kubectl apply -f $ROOT/configs/metallb/metallb-l2advertisement.yaml

which is in setup_master_node.sh leads to the following error:

Error from server (InternalError): error when creating "./configs/metallb/metallb-ipaddresspool.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s: context deadline exceeded

Still try to figure out what leads to this error.

Farrrrland commented 1 year ago

Deployment is successful if I change to stock-only version set up following the start-up guide, but I wonder why the difference happens?

But still invocation of example functions pushd ./examples/invoker && go build && popd && ./examples/invoker/invoker leads to error:

~/vhive/examples/invoker ~/vhive ~/vhive
go: downloading github.com/golang/protobuf v1.5.2
go: downloading google.golang.org/grpc v1.47.0
go: downloading github.com/google/uuid v1.2.0
go: downloading google.golang.org/protobuf v1.28.0
go: downloading github.com/containerd/containerd v1.5.2
go: downloading go.opentelemetry.io/otel/sdk v0.20.0
go: downloading go.opentelemetry.io/otel v0.20.0
go: downloading go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.20.0
go: downloading go.opentelemetry.io/otel/exporters/trace/zipkin v0.20.0
go: downloading go.opentelemetry.io/otel/trace v0.20.0
go: downloading go.opentelemetry.io/contrib v0.20.0
go: downloading golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f
go: downloading github.com/openzipkin/zipkin-go v0.2.5
go: downloading go.opentelemetry.io/otel/metric v0.20.0
go: downloading google.golang.org/genproto v0.0.0-20220502173005-c8bf987b8c21
go: downloading golang.org/x/net v0.0.0-20220722155237-a158d28d115b
go: downloading golang.org/x/text v0.3.7
~/vhive ~/vhive
INFO[2023-06-05T01:04:38.304839337-06:00] Reading the endpoints from the file: endpoints.json 
WARN[2023-06-05T01:04:38.342029510-06:00] Failed to invoke helloworld-0.default.192.168.1.240.sslip.io:80, err=rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/plain; charset=utf-8" 
WARN[2023-06-05T01:04:39.339741528-06:00] Failed to invoke pyaes-0.default.192.168.1.240.sslip.io:80, err=rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/plain; charset=utf-8" 
WARN[2023-06-05T01:04:40.346752047-06:00] Failed to invoke pyaes-1.default.192.168.1.240.sslip.io:80, err=rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/plain; charset=utf-8" 
WARN[2023-06-05T01:04:41.326754045-06:00] Failed to invoke helloworld-0.default.192.168.1.240.sslip.io:80, err=rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/plain; charset=utf-8" 
WARN[2023-06-05T01:04:42.324663336-06:00] Failed to invoke pyaes-0.default.192.168.1.240.sslip.io:80, err=rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/plain; charset=utf-8" 
INFO[2023-06-05T01:04:43.305779753-06:00] Issued / completed requests: 5, 0            
INFO[2023-06-05T01:04:43.305855754-06:00] Real / target RPS: 0.00 / 1                  
INFO[2023-06-05T01:04:43.305878828-06:00] Experiment finished!                         
INFO[2023-06-05T01:04:43.305893579-06:00] The measured latencies are saved in rps0.00_lat.csv 

The functions are correctly deployed with source /etc/profile && pushd ./examples/deployer && go build && popd && ./examples/deployer/deployer

~/vhive/examples/deployer ~/vhive ~/vhive
~/vhive ~/vhive
INFO[0002] Deployed function pyaes-1                    
INFO[0003] Deployed function pyaes-0                    
INFO[0003] Deployed function helloworld-0               
INFO[0003] Deployment finished  
Farrrrland commented 1 year ago

Successfully started single node cluster rather than a multi-node one and will close this issue in this case.