vhive-serverless / vHive

vHive: Open-source framework for serverless experimentation
MIT License
279 stars 86 forks source link

Function deployment fails in multi node kind clusters #126

Open shyamjesal opened 3 years ago

shyamjesal commented 3 years ago

Setup: 1 master node 2 worker nodes Image used: vhiveease/vhive_dev_env Error: Knative function containers fail to boot. Status loops between [Error, CrashLoopBackOff, Terminating]. Error log:

root@kind-control-plane:/vhive# kubectl logs helloworld-0-00001-deployment-7b8fcdb447-knnhf queue-proxy
time="2021-02-18T13:27:16Z" level=info msg="This is the vHive QP code"
required key GUEST_ADDR missing value
vfoehn commented 3 years ago

I experience the same error when trying to run vHive on a single-node minikube cluster. However, I don't think the cluster setup has anything to do with the error. It seems like the image docker.io/vhiveease/queue-39be6f1d08a095bd076a71d288d295b6 doesn't run correctly unless GUEST_ADDR is set (perhaps as an environment variable).

The following lines (introduced on 27.11.2020 and 30.11.2020) inside the file cri/container_create.go seem to require a value for GUEST_ADDR inside the function createQueueProxy(...):

guestIPEnv = "GUEST_ADDR"
...
guestIPKeyVal := &criapi.KeyValue{Key: guestIPEnv, Value: vmConfig.guestIP}

Note: I have tried both versions (i.e., latest and c3d2e0ad00457c972b642a1b6cf7c67ac1a58a5dcb83207e244c3cf6e19befbe) of the image docker.io/vhiveease/queue-39be6f1d08a095bd076a71d288d295b6, but neither works for me. Since both versions are younger than the code changes described above, this isn't a surprise to me.

If I decide to work with vHive in the future, I will take a closer look at this issue.

ustiugov commented 3 years ago

@vfoehn thank you for your comment.

vHive starts each VM with a tap and its corresponding IP address (this is assigned by the vHive code not the CNI plugin), using the environment variable that you mentioned. This is the intended behavior.

This issue is about a multi-node setup in a kind container (i.e., a kubernetes cluster running in Docker containers instead of VMs or bare metal). The multi-node setup should work on bare metal or VMs without any problem.

If you use vHive in a different scenario. Please raise a separate issue and we could try to investigate it together.

vfoehn commented 3 years ago

@ustiugov thanks for the quick reply and the information.

I didn't follow the instructions from docs/quickstart_guide.md to a T since I wanted to run vHive on a minikube cluster (as opposed to creating a new cluster using kubeadm) I had already installed on my machine. Consequently, it is very possible that the mistake was caused by me or my setup. This is why I'm hesitant to raise a separate issue just yet. However, if I start using vHive for my project, I will investigate it more closely.

prateeksahu commented 2 years ago

Hi @ustiugov , I am running into a similar issue using a local single node cluster setup using the single node steps described in the quick-start guide.

Since I used the create_one_node_cluster and did not see any errors, I am assuming all the deployment have been done correctly. Any insight into it is appreciated.

ustiugov commented 2 years ago

Hi @prateeksahu,

This issue was open for kind containers, are you running in kind or on a bare metal / VM? If your issue is not related to the kind setup, please open a new issue.

Then, could you please specify the exact steps (which you execute on a clean Ubuntu 18 node or a vm) and supply relevant snippets of the logs, attaching full logs?