Closed varunbpatil closed 3 years ago
Hi @andresmgot , I have tested this change with 1000 requests and a concurrency of 50. I did not observe any deadlock. Without this change, the same command results in a deadlock pretty easily. The function I used to test this is stable/python/examples/preload.py
which I have included in this PR.
I have also added some inline comments on the PR where I thought an explanation would be necessary.
$ hey -n 1000 -c 50 -m GET -t 0 -disable-keepalive http://localhost:8080
Summary:
Total: 7.1151 secs
Slowest: 2.9556 secs
Fastest: 0.0102 secs
Average: 0.2326 secs
Requests/sec: 140.5466
Total data: 11000 bytes
Size/request: 11 bytes
Response time histogram:
0.010 [1] |
0.305 [939] |________________________________________
0.599 [4] |
0.894 [0] |
1.188 [21] |_
1.483 [15] |_
1.777 [8] |
2.072 [5] |
2.367 [1] |
2.661 [0] |
2.956 [6] |
Latency distribution:
10% in 0.0409 secs
25% in 0.0946 secs
50% in 0.1819 secs
75% in 0.2225 secs
90% in 0.2433 secs
95% in 1.1023 secs
99% in 2.0323 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0582 secs, 0.0102 secs, 2.9556 secs
DNS-lookup: 0.0007 secs, 0.0000 secs, 0.0132 secs
req write: 0.0001 secs, 0.0000 secs, 0.0074 secs
resp wait: 0.1724 secs, 0.0099 secs, 1.9005 secs
resp read: 0.0019 secs, 0.0000 secs, 0.1229 secs
Status code distribution:
[200] 1000 responses
Thank you for the PR and the notes @varunbpatil! I just have a request for maintaining the
req
object and maintain backward compatibility.
Hi @andresmgot, sorry for the delay in fixing this. I have made changes to preserve the request
object so that custom headers, query strings, forms, file uploads, etc can be accessed in the user code. The changes work, but, it is starting to look like a hack which requires understanding of the internals of the Bottle framework. I can't think of any other way to make the request object serializable.
Maybe a better design would be to run multiple single-threaded servers behind a reverse-proxy to handle multiple requests in parallel (i.e, a pre-fork model). This comes with its own problems for handling timeouts, managing multiple server processes, synchronized logging, etc.
I am thinking this PR should be sufficient to fix this somewhat critical bug while maintaining backward compatibility. Maybe we can re-evaluate the design in the near future.
I would be glad to hear your thoughts on this.
it seems that the test get-python-deps
is failing with the new image, can you check it @varunbpatil ?
it seems that the test
get-python-deps
is failing with the new image, can you check it @varunbpatil ?
Will check. Seems like something to do with Prometheus.
do you mind adding yourself to the OWNERS file?
Will do.
it seems that the test
get-python-deps
is failing with the new image, can you check it @varunbpatil ?Will check. Seems like something to do with Prometheus.
do you mind adding yourself to the OWNERS file?
Will do.
Hi @andresmgot, I have fixed the issue. Those prometheus metrics variables needed to be moved inside if __name__ == '__main__'
because they can only be created once. I missed it earlier since I had commented out some of the code. I have also added myself to the OWNERS
file.
I have force-pushed my changes, so your changes to the image sha256 will have to be done again since the image will be different. Sorry for the trouble.
Hi @andresmgot , not sure what the current failure is. I tried with the new Docker images on a local Kubernetes cluster and it works for me.
[~/runtimes/stable/python] $ make get-python-deps
cd examples && zip hellowithdeps.zip hellowithdeps.py hellowithdepshelper.py
/home/varun.patil/runtimes/stable/python/examples
updating: hellowithdeps.py (stored 0%)
updating: hellowithdepshelper.py (deflated 29%)
kubeless function deploy get-python-deps --runtime python3.6 --handler hellowithdeps.foo --from-file examples/hellowithdeps.zip --dependencies examples/requirements.txt --image-pull-policy IfNotPresent
INFO[0000] Deploying function...
INFO[0000] Function get-python-deps submitted for deployment
INFO[0000] Check the deployment status executing 'kubeless function ls get-python-deps'
[~/runtimes/stable/python] $ make get-python-deps-verify
kubectl rollout status deployment/get-python-deps && sleep 2
deployment "get-python-deps" successfully rolled out
kubeless function call get-python-deps |egrep Google
Google
Is there a way to get the deployment status?
Is there a way to get the deployment status?
Let me check, I am going to run the tests with SSH and report back the issue.
so it's working properly, it seems that even though the pod is running, it's not able to accept requests right away. Note that there is no error in the logs so the request is not even handled:
circleci@default-dca24841-1afb-473b-90f7-86c13f586b8c:~/project/stable/python$ make get-python-deps-verify
kubectl rollout status deployment/get-python-deps && sleep 2
deployment "get-python-deps" successfully rolled out
kubeless function call get-python-deps |egrep Google
Google
circleci@default-dca24841-1afb-473b-90f7-86c13f586b8c:~/project/stable/python$ kubectl logs get-python-deps-6b694c6f98-2hn6d
Bottle v0.12.13 server starting up (using CherryPyServer(numthreads=10))...
Listening on http://0.0.0.0:8080/
Hit Ctrl-C to quit.
172.17.0.1 - - [28/Sep/2020:09:05:26 +0000] "GET /healthz HTTP/1.1" 200 2 "" "kube-probe/1.15" 0/237
172.17.0.1 - - [28/Sep/2020:09:05:56 +0000] "GET /healthz HTTP/1.1" 200 2 "" "kube-probe/1.15" 0/139
172.17.0.1 - - [28/Sep/2020:09:06:26 +0000] "GET /healthz HTTP/1.1" 200 2 "" "kube-probe/1.15" 0/142
172.17.0.1 - - [28/Sep/2020:09:06:30 +0000] "GET / HTTP/1.1" 200 6 "https://10.142.4.173:8443/api/v1/namespaces/default/services/get-python-deps:http-function-port/proxy" "kubeless/v0.0.0 (linux/amd64) kubernetes/$Format" 0/80126
Adding a small sleep should be enough.
By the way, pull before pushing more changes so the change in the jsonnet doesn't get lost.
But, there is already a 2 second sleep for every function. Do you want me to increase it for every function or just for the first function(hopefully others will be ready to accept requests by then)?
By the way, pull before pushing more changes so the change in the jsonnet doesn't get lost.
Will do.
I'll also add a dump of the K8s objects on failure similar to https://github.com/kubeless/kubeless/blob/25d6376c91730f0fecb4c6f318b368dc4ce4eb03/script/integration-tests#L86-L104
But, there is already a 2 second sleep for every function. Do you want me to increase it for every function or just for the first function(hopefully others will be ready to accept requests by then)?
just for that test should be fine
But, there is already a 2 second sleep for every function. Do you want me to increase it for every function or just for the first function(hopefully others will be ready to accept requests by then)?
just for that test should be fine
Hi @andresmgot, increased the sleep duration to 5s for just the first test. CI is passing.
Great, thanks!
Drop support for Python2.7 runtime because it has reached End Of Life and also because "spawn" and "forkserver" multiprocessing contexts are not supported in Python 2.7 which are required to fix the deadlock.
Fixes #94