Closed ymc101 closed 2 months ago
Please attach Firecracker logs from /tmp/vhive-logs/
from worker node: firecracker.stderr.zip firecracker.stdout.zip
Hi @leokondrashov I just tried running the setup, deployment, and invocation steps on CloudLab, and it proceeded smoothly without incident, however the output file rps0.00_lat.csv
is empty. Is this expected behaviour? Am I missing certain steps or configurations in order to get the latencies of the function runs?
For reference below is the endpoints.json
file from the master node:
[
{
"hostname": "helloworld-0.default.192.168.1.240.sslip.io",
"eventing": false,
"matchers": null
},
{
"hostname": "pyaes-0.default.192.168.1.240.sslip.io",
"eventing": false,
"matchers": null
},
{
"hostname": "pyaes-1.default.192.168.1.240.sslip.io",
"eventing": false,
"matchers": null
},
{
"hostname": "rnn-serving-0.default.192.168.1.240.sslip.io",
"eventing": false,
"matchers": null
},
{
"hostname": "rnn-serving-1.default.192.168.1.240.sslip.io",
"eventing": false,
"matchers": null
},
{
"hostname": "rnn-serving-2.default.192.168.1.240.sslip.io",
"eventing": false,
"matchers": null
}
]
Sorry for the late follow-up. The 0.00 in the name of the file means that all invocations failed. In the output of the invoker run, how many requests succeeded should be said. Although, that might be caused by cold starts taking some time. You can try to rerun the invoker several times so that it can warm the instances up. If the requests still fail, please provide output of kubectl get pods
, there should be pods in running states, if they are not, please run kubectl describe pod <pod_name>
on them.
I ran it 2 more times after getting 0.00, and this was the contents of the csv:
8651
8922
17281
64952
41603
Is this expected behaviour? And if yes can I clarify what does each number correspond to, and what is the units of the numbers in milliseconds?
They are the end-to-end delay measurements for each of the requests in microseconds.
Does each number correspond to one function in endpoints.json
? Or does it represent the time it takes for all functions to finish executing (concurrently or sequentially?)
Incidentally, I tried running the invoker once on the worker node and i got 1 value in the rps_0.20.csv
file, so im a bit confused on the representation of the output
It reports end-to-end latencies for requests to functions from endpoints.json in round-robin fashion. One number for each requests. By default, it is 1 request per second for 5 seconds (both can be changed: https://github.com/vhive-serverless/vSwarm/blob/main/tools/invoker/README.md).
The number in the file's name means the number of successful invocations per second (0.2 means 1 successful request in 5 seconds). The cold starts might cause a low success rate, since all requests that will respond after 5 seconds are considered failed. That's why rerunning the invoker helps to get more appropriate results.
I see. Can I also check how to run some of the benchmark functions in vSwarm? I am trying out the fibonacci one but i got a make: docker: Command not found
error when trying to run make all-images
. I tried installing docker with sudo apt-get install docker
but am still getting the same error. Do you have an idea what might be the issue?
Hi @leokondrashov, do you have any input on this?
I'm sorry. I thought I had sent the comment. It's better to ask questions about benchmarks in the vSwarm repository. But regarding the docker issue, docker is installed with apt install docker.io
.
Alright, ill start a new issue in the vSwarm repository if I have further questions on the benchmarks. But for the docker issue, i tried installing docker.io
, but it is unable to find the package:
E: Unable to locate package docker.io
E: Couldn't find any package by glob 'docker.io'
E: Couldn't find any package by regex 'docker.io'
Am I supposed to do sudo apt-get upgrade
first?
Yes, it is better to run sudo apt update
. Please also provide the distro info, because it is weird that Ubuntu can't find the docker.io
package.
Otherwise, proceed with official guides on docker installation: https://docs.docker.com/engine/install/ubuntu/.
Nothing gets updated when i run the upgrade, and it still cannot find docker.io
package. I am running Ubuntu 20.04 LTS (GNU/Linux 5.4.0-164-generic x86_64)
OS, based on the CloudLab profile provided in the vHive quickstart guide. Was the docker installation working fine when you tested on this setup previously?
Edit: it can find the package now after running update
instead
I just tried on fresh xl170 cloudlab node, it works. I ran two commands: sudo apt update; sudo apt install docker.io
. Successfully installs and runs. Also, you might need to add the user to docker
group: https://docs.docker.com/engine/install/linux-postinstall/, but that doesn't affect the availability of the package.
Thanks. The docker make works fine now. But now it seems like there is issue with deploying the benchmark, ill start a new issue on the vSwarm repository for that.
Hi, I am trying out the whole vHive setup with function deployment and invocation using 1 master node and 1 worker node running on 2 separate VMs, and I encountered some errors when running the deployer client
source /etc/profile && pushd ./tools/deployer && go build && popd && ./tools/deployer/deployer -funcPath ~/vhive/configs/knative_workloads
This was the output:Additionally, below are logs from some commands I tried:
kubectl describe deployment
:kubectl get revisions
andkubectl describe revision <name>
: