Open paul-crease opened 6 years ago
@paul-crease
For this snippet:
from clipper_admin import ClipperConnection, KubernetesContainerManager
from subprocess import Popen, PIPE
print("Connecting...")
clipper_host_public_ip = Popen(['minikube', 'ip'], stdout=PIPE).communicate()[0].strip()
print("Listing apps...")
clipper_conn = ClipperConnection(KubernetesContainerManager(kubernetes_api_ip=clipper_host_public_ip,useInternalIP=True))
clipper_conn.connect()
print(clipper_conn.get_all_apps())
If I add clipper_conn.start_clipper()
before clipper_conn.connect()
, (running clipper the first time), I was able to start prometheus in my minikube environment, logging shows:
level=info ts=2018-04-20T09:02:36.385822391Z caller=main.go:585 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-04-20T09:02:36.386613331Z caller=kubernetes.go:191 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
Can you try clipper_conn.start_clipper()
and see if that will work?
@simon-mo - Thanks for the suggestion. I updated the code as suggested but had no luck:
from clipper_admin import ClipperConnection, KubernetesContainerManager
from subprocess import Popen, PIPE
print("Connecting...")
clipper_host_public_ip = Popen(['minikube', 'ip'], stdout=PIPE).communicate()[0].strip()
print("Listing apps...")
clipper_conn = ClipperConnection(KubernetesContainerManager(kubernetes_api_ip=clipper_host_public_ip,useInternalIP=True))
clipper_conn.start_clipper()
clipper_conn.connect()
print(clipper_conn.get_all_apps()
Again I just get the error msg below repeatedly:
18-04-20:11:32:30 INFO [clipper_admin.py:112] Clipper still initializing.
I also tried to remove clipper completely and then re-run the suggested code, but I get the same result as before for Metrics logging
@paul-crease
v0.26.1
, K8s v1.10.0
on a Mac. clipper_host_public_ip = Popen(['minikube', 'ip'], stdout=PIPE).communicate()[0].strip()
print("Listing apps...")
clipper_conn = ClipperConnection(KubernetesContainerManager(kubernetes_api_ip=clipper_host_public_ip,useInternalIP=True))
clipper_conn.start_clipper()
clipper_conn.connect()
print(clipper_conn.get_all_apps())
with success. Although it took quite a long time, this is because the local machine is pulling the images. It took ~3 minutes on average download speed of 7-9Mb/s for the fresh install.
kubectl get pods
and then kubectl describe po/{pod-NAME} | tail
to see if the pulling is currently happening. For example, I got results like this:
kubectl describe po/metrics-7d577dbc99-b8qtr | tail
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m default-scheduler Successfully assigned metrics-7d577dbc99-b8qtr to minikube
Normal SuccessfulMountVolume 2m kubelet, minikube MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 2m kubelet, minikube MountVolume.SetUp succeeded for volume "default-token-cgsxx"
Normal Pulling 2m kubelet, minikube pulling image "prom/prometheus:v2.1.0"
It should also shows Waiting: ContainerCreating
on Dashboard.
What's the output when you run minikube config view
in terminal? I'm wondering if it is possible that any configuration flags were set. Especially for flags like:
registry
--bootstrapper=kubeadm
apiserver.Authorization.Mode=RBAC
Lastly, this might (with low probability) be VirtualBox issue (I noticed the node is 10.0.2.15
). If you have docker installed in your machine. Can you try minikube start --insecure-registry localhost:5000 --vm-driver hyperkit
? This will use docker's hypervisor to run the cluster, instead of creating the vm in VirtualBox.
Thank you for your patience.
Again, thanks for the quick response. I am using OSX v10.11.5
"I'm wondering if it is possible that any configuration flags were set" - I just used defaults, minikube config view
returns nothing.
minikube start --insecure-registry localhost:5000 --vm-driver hyperkit
- this still results in the same behaviour
kubectl get pods all
- I get the same results, everything is Running after a couple of minutes, but the problem persists.
@chester-leung is going to try to reproduce this on OSX.
@paul-crease I'm able to reproduce this on OSX 10.12.6.
I'll look further into this and let you know what I find.
@paul-crease Thanks for you patience. @chester-leung and I were able to figure out the issue.
minikube
runs on VirtualBox, (which is the default option), all the ports are closed. User have to use kubectl proxy
to access kubernetes api and all the services. Clipper is still initializing
because it can't access the query frontend. minikube delete
to delete the current kubernetes clusterminikube start --vm-driver hyperkit
to start a brand new minikube cluster with hyperkit.Hello. Thank you for the solution, it now works as expected. Another solution I found was to downgrade minikube to 0.25.1, which then uses K8s v1.9.4 by default. This also then solves the problem.
Thanks for this useful topic.
I have a similar issue. I'm installing Clipper (the latest version) on Google Kubernetes Engine.
Initially I also was getting Clipper still initializing.
in python3 CLI after running clipper_conn.start_clipper()
but using kubectl proxy --port 8080
and providing kubernetes_proxy_addr
in KubernetesContainerManager's constructor lets it succeed. So for example I can see the registered apps with clipper_conn.get_all_apps()
.
However, the problem hasn't gone away.
I see a lot of log messages like
2018-07-11 13:24:33.000 EEST
level=error ts=2018-07-11T10:24:33.258312991Z caller=main.go:221 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:296: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:clipper:default\" cannot list pods at the cluster scope: Unknown user \"system:serviceaccount:clipper:default\""
in the metrics container's log output.
In addition, after issuing the command
python_deployer.deploy_python_closure(clipper_conn, name="sum-model", version=1, input_type="doubles", func=feature_sum)
as described here
I see sum-model-1-deployment-at-0-at-tes
" deployment with the status 0 of 1 updated replicas available - ImagePullBackOff
, and after some time the command fails with the message:
INFO [clipper_admin.py:474] [test] Pushing model Docker image to test-sum-model:1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/clipper_admin/deployers/python.py", line 222, in deploy_python_closure
registry, num_replicas, batch_size, pkgs_to_install)
File "/usr/local/lib/python3.5/dist-packages/clipper_admin/clipper_admin.py", line 352, in build_and_deploy_model
num_replicas, batch_size)
File "/usr/local/lib/python3.5/dist-packages/clipper_admin/clipper_admin.py", line 560, in deploy_model
num_replicas=num_replicas)
File "/usr/local/lib/python3.5/dist-packages/clipper_admin/kubernetes/kubernetes_container_manager.py", line 393, in deploy_model
name=deployment_name, namespace=self.k8s_namespace).status.available_replicas \
File "/usr/local/lib/python3.5/dist-packages/kubernetes/client/apis/extensions_v1beta1_api.py", line 5758, in read_namespaced_deployment_status
(data) = self.read_namespaced_deployment_status_with_http_info(name, namespace, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/kubernetes/client/apis/extensions_v1beta1_api.py", line 5843, in read_namespaced_deployment_status_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.5/dist-packages/kubernetes/client/api_client.py", line 321, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/local/lib/python3.5/dist-packages/kubernetes/client/api_client.py", line 155, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.5/dist-packages/kubernetes/client/api_client.py", line 342, in request
headers=headers)
File "/usr/local/lib/python3.5/dist-packages/kubernetes/client/rest.py", line 231, in GET
query_params=query_params)
File "/usr/local/lib/python3.5/dist-packages/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': '3e877d67-962f-40bc-82b9-0733c5a4bbe5', 'Date': 'Tue, 10 Jul 2018 23:29:06 GMT', 'Content-Length': '129', 'Content-Type': 'application/json', 'Www-Authenticate': 'Basic realm="kubernetes-master"'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
Could you suggest how to solve this?
I am unable to run Clipper on K8s version 1.10.0 (using minikube). Prometheus seems to not have correct permissions.
Method to reproduce: install minikube version: v0.26.1, K8s version 1.10.0 run command to start minikube:
minikube start --insecure-registry localhost:5000
run python code to init clipper cluster on K8s:
Expected Result: All components of Clipper are installed, in a running state and queryable
Actual Result: CLI output simply repeats
[clipper_admin.py:112] Clipper still initializing.
K8s Logs for the metrics pod logs contain the following
K8s dashboard shows pods are running, but queries hang e.g.
listing apps with
throws the following error: