vhive-serverless / vSwarm

A suite of representative serverless cloud-agnostic (i.e., dockerized) benchmarks
MIT License
45 stars 20 forks source link

timeout error while deploying chained-function-serving #46

Closed dhuang-esl closed 2 years ago

dhuang-esl commented 2 years ago

Describe the bug After configured vHive following the quick start tutorial, I tried to deploy the chained-function-serving benchmark but it cannot be deployed.

To Reproduce After configuring vHive with vhive-ubuntu20 profile on CloudLab, I tried to deploy chained-function-serving with the following command according to Running benchmarks tuturial:

./tools/kn_deploy.sh benchmarks/chained-function-serving/knative_yamls/inline/*

Expected behavior Deployment finished successfully

Logs

~/vSwarm$ ./tools/kn_deploy.sh benchmarks/chained-function-serving/knative_yamls/inline/*
++ set -e
++ '[' 3 -eq 0 ']'
++ for pattern in "$@"
++ for file in $pattern
++ echo 'applying benchmarks/chained-function-serving/knative_yamls/inline/service-consumer.yaml'
applying benchmarks/chained-function-serving/knative_yamls/inline/service-consumer.yaml
++ kn service apply -f /dev/fd/63
+++ envsubst
Creating service 'consumer' in namespace 'default':

  0.034s The Route is still working to reflect the latest desired specification.
  0.041s ...
  0.053s Configuration "consumer" is waiting for a Revision to become ready.
Error: timeout: service 'consumer' not ready after 600 seconds
Run 'kn --help' for usage

I also checked the deploying information:

~/vSwarm$ kn service list consumer
NAME            URL                                                   LATEST                AGE   CONDITIONS   READY   REASON
consumer        http://consumer.default.192.168.1.240.sslip.io                              10m   0 OK / 3     False   RevisionMissing : Configuration "consumer" does not have any ready Revision.

Notes By the way, I noticed you mentioned "The function deployment can be monitored using kn service list --all" in Running benchmarks tuturial. However, it will drop an error:

~/vSwarm$ kn service list --all
Error: unknown flag: --all for 'kn service list'
Run 'kn service list --help' for usage

So, do you actually mean run command kn service list --all-namespaces?

ustiugov commented 2 years ago

@dhuang-esl regarding the kn service list --all-namespaces, you are right, can you please submit a fix via a PR?

I think I know what the problem is. Atm, vSwarm Knative yaml files are written for using containers, not MicroVMs. Info on running vHive with containers is here. It would be good to clarify it in the *.md file you mentioned.

We would welcome docs improvements 👍

dhuang-esl commented 2 years ago

Hi @ustiugov, thanks for your clarification on running the benchmarks with containers and the kn command! I also created a fix with this PR #59.

ustiugov commented 2 years ago

thanks! please do not close it before merging your fix.

adayaru commented 2 years ago

Is the original issue being tracked by another issue ID? I am still facing the issue with "Configuration "consumer" is waiting for a Revision to become ready.".

I tried fibonacci too - same issue:

Command used: kn service apply -f ./yamls/knative/kn-fibonacci-python.yaml

Result: Services are "waiting for a Revision"

Status: kn services list NAME URL LATEST AGE CONDITIONS READY REASON fibonacci-python http://fibonacci-python.default.192.168.1.240.sslip.io 43s 0 OK / 3 Unknown RevisionMissing : Configuration "fibonacci-python" is waiting for a Revision to become ready.

Just to add, I am able to run the examples given in vHive QuickStart guide page at: https://github.com/ease-lab/vhive/blob/main/docs/quickstart_guide.md#iv-deploying-and-invoking-functions-in-vhive

ustiugov commented 2 years ago

@adayaru this issue was fixed by #59, I believe.

Is the issue you observe sporadic or deterministic? sometimes resetting the cluster helps.

I close the issue now, please re-open if it's a deterministic issue and it reproduces.

adayaru commented 2 years ago

The fix for #59 is a documentation fix. That didn't change my scenario.

I followed the instructions as given in https://github.com/ease-lab/vSwarm/tree/main/benchmarks/fibonacci#running-this-benchmark-using-knative

I am using a single-node vHive serverless cluster with Firecracker MicroVMs.

Hardware: VMware VM with 4CPU and 32GB RAM - so I presume machine resources is not an issue.

If this doesn't work, one option for me is to "port" vSwam code into the vHive examples (like say, replacing hello world code with fibonacci) and try this.

ustiugov commented 2 years ago

@adayaru I believe your problem is unrelated to this Issue.

Firecracker-based vHive requires slightly different format of YAML files when deploying a function. Take a look at the firecracker YAML here. Particularly, these lines:

          env:
            - name: GUEST_PORT # Port on which the firecracker-containerd container is accepting requests
              value: "50051"
            - name: GUEST_IMAGE # Container image to use for firecracker-containerd container
              value: "ghcr.io/ease-lab/helloworld:var_workload"
adayaru commented 2 years ago

Thanks @ustiugov ! I had already tried to use a yaml similar to the one in the example for Firecracker-based vHive - here's what I tried:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: fibonacci-python
  namespace: default
spec:
  template:
    spec:
      containers:
        - image: crccheck/hello-world:latest # Stub image. See https://github.com/ease-lab/vhive/issues/68
          ports:
            - name: h2c # For GRPC support
              containerPort: 50051
          env:
            - name: GUEST_PORT # Port on which the firecracker-containerd container is accepting requests
              value: "50051"
            - name: GUEST_IMAGE # Container image to use for firecracker-containerd container
              value: "docker.io/vhiveease/fibonacci-python:latest"

When I applied the yaml above, this is the output:

root@ubuntu77# kn services apply -f kn-vhive-fibonacci-python.yaml
Creating service 'fibonacci-python' in namespace 'default':

  0.035s The Route is still working to reflect the latest desired specification.
  0.057s ...
  0.083s Configuration "fibonacci-python" is waiting for a Revision to become ready.
  5.038s ...
  5.048s Ingress has not yet been reconciled.
  5.106s Waiting for load balancer to be ready
  5.295s Ready to serve.

Service 'fibonacci-python' created to latest revision 'fibonacci-python-00001' is available at URL:
http://fibonacci-python.default.192.168.1.240.sslip.io
root@ubuntu77#

It says that the "service" is up:

root@ubuntu77# kn services list --all-namespaces
NAMESPACE   NAME               URL                                                      LATEST                   AGE   CONDITIONS   READY   REASON
default     fibonacci-python   http://fibonacci-python.default.192.168.1.240.sslip.io   fibonacci-python-00001   19h   3 OK / 3     True   
root@ubuntu77#

But test-client fails:

root@ubuntu77# ./test-client --addr $URL:80 --name 13
2022/08/17 08:56:01 could not greet: rpc error: code = Unimplemented desc = Method not found!
root@ubuntu77#

I have a couple of questions:

  1. vSwarm recommends using vHive. vHive recommends using kvm and firecracker containers. Now, vHive samples are running fine. However vSwarm sample (I just tried fibonacci - since it's a simple function) doesn't even come up. Here's the output:
root@ubuntu77:~/vSwarm/benchmarks/fibonacci# kn service apply -f ./yamls/knative/kn-fibonacci-python.yaml
Creating service 'fibonacci-python' in namespace 'default':

  0.046s The Route is still working to reflect the latest desired specification.
  0.080s ...
  0.141s Configuration "fibonacci-python" is waiting for a Revision to become ready.
^C
root@ubuntu77#

You have mentioned that "Firecracker-based vHive requires slightly different format of YAML files when deploying a function. ". So it's likely that I am doing something wrong with the yaml I tried. Please take a quick look at the modified yaml I pasted above and let me know if the modifications are ok. Appreciate the help!

  1. Will the same docker images that are in the example work with vHive too? I compared the Dockerfile contents under vSwarm benchmark directory and vHive example directory. There were some minor differences: For instance the base image used by vHive was vhiveease/py_grpc:base (which also copied files from vhiveease/py_grpc:builder_grpc). The base image used bu vSwarm is vhiveease/python-slim:latest Should I be building the images in another way for using with vHive that runs on firecracker.containerd?
ustiugov commented 1 year ago

@adayaru sorry for the (very) late response. vHive doesn't recommend any of the sandbox technologies but supports all of them, maybe we need to clarify that in the docs.

The YAML files seems right. If the issue is still relevant, please open a new issue (please don't write to closed issues as it might lead to delayed responses) and supply logs from firecracker-containerd, vhive, and containerd.